<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TextWise Blog &#187; learning</title>
	<atom:link href="http://blog.textwise.com/tag/learning/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.textwise.com</link>
	<description>A blog about the SemanticHacker API by TextWise</description>
	<lastBuildDate>Wed, 31 Aug 2011 18:50:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Cutting Edge Semantics?</title>
		<link>http://blog.textwise.com/2011/03/15/cutting-edge-semantics/</link>
		<comments>http://blog.textwise.com/2011/03/15/cutting-edge-semantics/#comments</comments>
		<pubDate>Tue, 15 Mar 2011 17:02:04 +0000</pubDate>
		<dc:creator>Clinton Mah</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Opinion]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[semantics]]></category>
		<category><![CDATA[accidents]]></category>
		<category><![CDATA[Aristotle]]></category>
		<category><![CDATA[categories]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[predicate]]></category>
		<category><![CDATA[subject]]></category>
		<category><![CDATA[summarization]]></category>
		<category><![CDATA[taxonomy. semantics]]></category>

		<guid isPermaLink="false">http://blog.textwise.com/?p=448</guid>
		<description><![CDATA[Aristotle lived about 2,400 years ago, well before the advent of the Worldwide Web. Yet his ideas drive the still emerging Semantic Web. In fact, we could probably do a better job as modern information scientists if we paid a bit more attention to the ancient Greek philosopher. In his writing called &#8220;Categories,&#8221; Aristotle addressed [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.textwise.com%2F2011%2F03%2F15%2Fcutting-edge-semantics%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.textwise.com%2F2011%2F03%2F15%2Fcutting-edge-semantics%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Aristotle lived about 2,400 years ago, well before the advent of the Worldwide Web. Yet his ideas drive the still emerging Semantic Web. In fact, we could probably do a better job as modern information scientists if we paid a bit more attention to the ancient Greek philosopher.</p>
<p>In his writing called &#8220;Categories,&#8221; Aristotle addressed the problem of meaning in language and developed a logical framework for semantics. In this work, he invented the theory of subjects and predicates, which modern grammar and formal logic have adopted. This was in effect RDF version 0.0.0.</p>
<p>Aristotle also talked about using taxonomies (from the Greek τάξις + νόμος) to define the meanings of concepts, introducing &#8220;genus&#8221; and &#8220;species&#8221; as essential relationships. This approach was adopted by Linnaeus in the 18th Century to catalog the great diversity of life on earth; and more than a hundred years later, formal taxonomies made their way into library science.</p>
<p>Of special interest to us here is Aristotle&#8217;s classification of the predicates associated with definitions of meaning. He defined five types: genus, species, difference, property, and accident. The first two are already familiar to information scientists as IS-A relationships. A difference predicate relates to a defining characteristic for a concept. A property is an important characteristic for a concept, but not sufficient to define it. An accident is a true predicate that makes no contribution to meaning.</p>
<p>For example,</p>
<p>(genus/species) Angelina Jolie is an American movie star.<br />
(difference) She is the daughter of American Actor John Voight.<br />
(property) She trained with Lee Stasberg.<br />
(accident) She visited Costa Del Sol.</p>
<p>In automated building of semantic dictionaries, our problem is with accidental predicates. Such predicates have only a weak relationship to a subject and tend to lead to noisy inferred associations. We probably do not want to retrieve a news item about Angelina Jolie given a query about Costa del Sol.</p>
<p>Unfortunately, many and perhaps most predicates in text data are accidental. In current data driven semantic learning systems, we make no distinction here yet, and so there are opportunities here for major improvements. A possible approach here is to employ the techniques of text summarization to identify the most important &#8220;predicates&#8221; in our data and thus bias our statistics away from accidents toward properties and differences. Aristotle would be amused.</p>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fblog.textwise.com%2F2011%2F03%2F15%2Fcutting-edge-semantics%2F';
  addthis_title  = 'Cutting+Edge+Semantics%3F';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
]]></content:encoded>
			<wfw:commentRss>http://blog.textwise.com/2011/03/15/cutting-edge-semantics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning</title>
		<link>http://blog.textwise.com/2009/12/21/learning/</link>
		<comments>http://blog.textwise.com/2009/12/21/learning/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 17:24:49 +0000</pubDate>
		<dc:creator>Clinton Mah</dc:creator>
				<category><![CDATA[semantics]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[meaning]]></category>
		<category><![CDATA[semantic]]></category>
		<category><![CDATA[vocabulary]]></category>

		<guid isPermaLink="false">http://blog.textwise.com/?p=147</guid>
		<description><![CDATA[Consider how we humans learn language. Even with formal education, it takes a child about 15 years starting from infancy to be able to read and understand general news articles in the New York Times. Over this period, one would probably hear or read at least on the order of 10 billion words. Even so, [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.textwise.com%2F2009%2F12%2F21%2Flearning%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.textwise.com%2F2009%2F12%2F21%2Flearning%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Consider how we humans learn language. Even with formal education, it takes a child about 15 years starting from infancy to be able to read and understand general news articles in the New York Times. Over this period, one would probably hear or read at least on the order of 10 billion words. Even so, most high schoolers will need many additional years of schooling to become able to comprehend technical material.</p>
<p>So, how can anyone expect a computer to understand something like medical text after training on only about 100 million words of data? A computer of course runs on nanosecond cycles while the human brain operates on millisecond cycles; but we have had about 50,000 generations to evolve our language software, while the electronic computer has had only about 10 generations.</p>
<p>The bottom line here is that language learning is difficult; and it requires sifting through immense amounts of data. There probably is no magic technological shortcut here, but we have reached now the stage where our systems can routinely handle the volumes of data that would support semantic capabilities equivalent to an 8th-grade education. Decent commercial language processing tools are also now available.</p>
<p>Consequently, we are making major progress on semantic dictionaries, but have to be realistic about the work still ahead of us. Expect no overnight miracles from us or anyone else, especially when these are based on measly samples of data. There is still no royal road to semantics.</p>
<script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fblog.textwise.com%2F2009%2F12%2F21%2Flearning%2F';
  addthis_title  = 'Learning';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script>
]]></content:encoded>
			<wfw:commentRss>http://blog.textwise.com/2009/12/21/learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

