Given a linguistic frontier like Twitter, we are tempted to dive down the rabbit hole of semantic analysis, but for online communities whose evolution outpaces the traditional development of corpuses, we need a to look at every tool we have to uncover their secrets.
We'll demonstrate how to answer questions like "How do idioms grow and spread?", "What regions are evolving fastest linguistically?", and "What regions' tendencies are spreading, or shrinking?" using a Node.js app to collect data from Twitter, MapReduce jobs to slice and dice it upon retrieval, and d3.js to generate share-able graphs of your research.
We'll focus on analyzing n-grams using CouchDB, but we'll discuss many techniques both statistical and semantic for studying language, teaching machines to parse it, and what JavaScript libraries will help you do that.
(Based on this talk from PyCon Canada 2013)
In Portland, Oregon born and raised
On the web was where I spent most of my days
Chilling out, maxing, relaxing all cool
and making crappy websites outside of the school
When a couple of guys, who were up to no good
Started coding node in my neighborhood
I wrote one little app and I got scared
and said, "callbacks are crazy but this has some flair!"
--
Max Thayer is a Fun Captain at Cloudant who dreams in JavaScript and only minds callbacks sometimes.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。