Input data provided in lieu of a filename
-n, --nodata Input data is 'undefined'
-q, --quiet Suppress normal output. 'console.log' will still trigger output.
--strict Use strict JSON parsing instead of more lax 'eval' syntax. To avoid security concerns, use this with ANY data from an external source.
--color Colorize output
--text Parse data as text instead of JSON. Sets input and output formats to 'text'
--trace Print stack traces when things go wrong
--coffee Interpret expression as CoffeeScript. See http://coffeescript.org/
--js Interpret expression as JavaScript. (default is "auto")
Examples:
underscore map --data '[1, 2, 3, 4]' 'value+1'
# [2, 3, 4, 5]
underscore map --data '{"a": [1, 4], "b": [2, 8]}' '_.max(value)'
# [4, 8]
echo '{"foo":1, "bar":2}' | underscore map -q 'console.log("key = ", key)'
# "key = foo\nkey = bar"
underscore pluck --data "[{name : 'moe', age : 40}, {name : 'larry', age : 50}, {name : 'curly', age : 60}]" name
# ["moe", "larry", "curly"]
underscore keys --data '{name : "larry", age : 50}'
# ["name", "age"]
underscore reduce --data '[1, 2, 3, 4]' 'total+value'
# 10
### Data Formats
#### json
The default format. Outputs strictly correct, human-readible JSON w/ smart whitespace. This format has received a lot of love. Try the '--color' flag.
{
"num": 9,
"bool": true,
"str1": "Hello World",
"object0": { },
"object1": { "a": 1, "b": 2 },
"array0": [ ],
"array1": [1, 2, 3, 4],
"array2": [1, 2, null, null, null, 6],
"date1": "2012-06-28T22:02:25.993Z",
"date2": "2012-06-28T22:02:25.993Z",
"err1": { },
"err2": { "3": "three", "prop1": 1, "prop2": 2 },
"regex1": { },
"regex2": { "3": "three", "prop1": 1, "prop2": 2 },
"null1": null,
"deep": {
"a": [
{
"longstr": "This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!",
"b": { "c": { } }
}
],
"g": {
"longstr": "This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!"
}
}
}
#### dense
Output dense JSON using JSON.stringify. Efficient, but hard to read.
{"num":9,"bool":true,"str1":"Hello World","object0":{},"object1":{"a":1,"b":2},"array0":[],"array1":[1,2,3,4],"array2":[1,2,null,null,null,6],"date1":"2012-06-28T22:02:25.993Z","date2":"2012-06-28T22:02:25.993Z","err1":{},"err2":{"3":"three","prop1":1,"prop2":2},"regex1":{},"regex2":{"3":"three","prop1":1,"prop2":2},"null1":null,"deep":{"a":[{"longstr":"This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!","b":{"c":{}}}],"g":{"longstr":"This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!"}}}
#### stringify
Output formatted JSON using JSON.stringify. A bit too verbose.
{
"num": 9,
"bool": true,
"str1": "Hello World",
"object0": {},
"object1": {
"a": 1,
"b": 2
},
"array0": [],
"array1": [
1,
2,
3,
4
],
"array2": [
1,
2,
null,
null,
null,
6
],
"date1": "2012-06-28T22:02:25.993Z",
"date2": "2012-06-28T22:02:25.993Z",
"err1": {},
"err2": {
"3": "three",
"prop1": 1,
"prop2": 2
},
"regex1": {},
"regex2": {
"3": "three",
"prop1": 1,
"prop2": 2
},
"null1": null,
"deep": {
"a": [
{
"longstr": "This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!",
"b": {
"c": {}
}
}
],
"g": {
"longstr": "This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!"
}
}
}
#### pretty
Output a richer 'inspection' syntax. When printing array-and-object graphs that can be generated by JSON.parse, the output is valid JavaScript syntax (but not strict JSON). When handling complex objects not expressable in declarative JavaScript (eg arrays that also have object properties), the output is informative, but not parseable as JavaScript.
{
num: 9,
bool: true,
str1: "Hello World",
object0: { },
object1: { a: 1, b: 2 },
array0: [ ],
array1: [1, 2, 3, 4],
array2: [1, 2, null, undefined, , 6],
date1: 2012-06-28T22:02:25.993Z,
date2: 2012-06-28T22:02:25.993Z{ "3": "three", prop1: 1, prop2: 2 },
err1: [Error: my err msg],
err2: [Error: my err msg]{ "3": "three", prop1: 1, prop2: 2 },
regex1: /^78/,
regex2: /^78/{ "3": "three", prop1: 1, prop2: 2 },
fn1: [Function],
fn2: [Function: fn_name],
fn3: [Function: fn_name],
fn4: [Function],
null1: null,
undef1: undefined,
deep: {
a: [
{
longstr: "This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!",
b: { c: { } }
}
],
g: {
longstr: "This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!"
}
}
}
#### inspect
Uses Node's 'util.inspect' to print the output
{ num: 9,
bool: true,
str1: 'Hello World',
object0: {},
object1: { a: 1, b: 2 },
array0: [],
array1: [ 1, 2, 3, 4 ],
array2: [ 1, 2, null, undefined, , 6 ],
date1: Thu Jun 28 2012 15:02:25 GMT-0700 (PDT),
date2: { Thu, 28 Jun 2012 22:02:25 GMT '3': 'three', prop1: 1, prop2: 2 },
err1: [Error: my err msg],
err2: { [Error: my err msg] '3': 'three', prop1: 1, prop2: 2 },
regex1: /^78/,
regex2: { /^78/ '3': 'three', prop1: 1, prop2: 2 },
fn1: [Function],
fn2: [Function: fn_name],
fn3: { [Function: fn_name] '3': 'three', prop1: 1, prop2: 2 },
fn4: { [Function] '3': 'three', prop1: 1, prop2: 2 },
null1: null,
undef1: undefined,
deep:
{ a:
[ { longstr: 'This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!',
b: { c: {} } } ],
g: { longstr: 'This really long string will force the object containing it to line-wrap. Underscore-cli is smart about whitespace and only wraps when needed!' } } }
#### text
If data is a string, it is printed directly without quotes. If data is an array, elements are separated by newlines. Objects and arrays-within-arrays are JSON formated into a single line. The stock example does not convey the intent of this format, which is designed to enable traditional text processing via JavaScript and to facilitate flattening of JSON lists into line-delimited lists.
# Real World Examples
### Playing with data from a webservice
Let's play with a real data source, like http://www.reddit.com/r/earthporn.json. For convenience (and consistent test results), an abbreviated version of this data is stored in example-data/earthporn.json.
First of all, note how raw unformatted JSON is really hard to parse with your eyes ...
{"kind":"Listing","data":{"modhash":"","children":[{"kind":"t3","data":{"domain":"i.imgur.com","banned_by":null,"media_e
mbed":{},"subreddit":"EarthPorn","selftext_html":null,"selftext":"","likes":null,"saved":false,"id":"rwoa4","clicked":fa
lse,"title":"Eating breakfast in the Norwegian woods! Captured with my phone [2448x3264] ","num_comments":70,"score":960
,"approved_by":null,"over_18":false,"hidden":false,"thumbnail":"http://b.thumbs.redditmedia.com/mytr7zvc0zZdPVV7.jpg","s
ubreddit_id":"t5_2sbq3","author_flair_css_class":null,"downs":352,"is_self":false,"permalink":"/r/EarthPorn/comments/rwo
a4/eating_breakfast_in_the_norwegian_woods_captured/","name":"t3_rwoa4","created":1333763527,"url":"http://i.imgur.com/R
hBFe.jpg","author_flair_text":null,"author":"pansermannen","created_utc":1333738327,"media":null,"num_reports":null,"ups
":1312}},{"kind":"t3","data":{"domain":"imgur.com","banned_by":null,"media_embed":{},"subreddit":"EarthPorn","selftext_h
tml":null,"selftext":"","likes":null,"saved":false,"id":"rwgmb","clicked":false,"title":"The Rugged Beauty of Zion NP Ut
ah at Sunrise [OC] (1924x2579)","num_comments":5,"score":72,"approved_by":null,"over_18":false,"hidden":false,"thumbnail
":"http://f.thumbs.redditmedia.com/0v2GKlqrj35YUaVw.jpg","subreddit_id":"t5_2sbq3","author_flair_css_class":null,"downs"
:20,"is_self":false,"permalink":"/r/EarthPorn/comments/rwgmb/the_rugged_beauty_of_zion_np_utah_at_sunrise_oc/","name":"t
3_rwgmb","created":1333755348,"url":"http://imgur.com/veRJD","author_flair_text":null,"author":"TeamLaws","created_utc":
1333730148,"media":null,"num_reports":null,"ups":92}},{"kind":"t3","data":{"domain":"flickr.com","banned_by":null,"media
_embed":{},"subreddit":"EarthPorn","selftext_html":null,"selftext":"","likes":null,"saved":false,"id":"rvuiu","clicked":
false,"title":"Falls and island near Valdez, AK on a rainy day [4200 x 3000]","num_comments":10,"score":573,"approved_by
As I've already mentioned, it would be trivial to pretty print the data with 'underscore print'. However, if we are just trying to get a sense of the structure of the data, we can do one better:
TODO: working on a 'summarize' command -- INSERT_THAT_HERE (2012-05-04)
Now, let's say that we want a list of all the image titles; using a [json:select](http://jsonselect.org#overview) query, this is downright trivial:
cat example-data/earthporn.json | underscore select .title
Which prints:
[ 'Fjaðrárgljúfur canyon, Iceland [OC] [683x1024]',
'New town, Edinburgh, Scotland [4320 x 3240]',
'Sunrise in Bryce Canyon, UT [1120x700] [OC]',
'Kariega Game Reserve, South Africa [3584x2688]',
'Valle de la Luna, Chile [OS] [1024x683]',
'Frosted trees after a snowstorm in Laax, Switzerland [OC] [1072x712]' ]
If we want to grep the results, 'text' is a better format choice:
cat example-data/earthporn.json | underscore select .title --outfmt text
Fjaðrárgljúfur canyon, Iceland [OC] [683x1024]
New town, Edinburgh, Scotland [4320 x 3240]
Sunrise in Bryce Canyon, UT [1120x700] [OC]
Kariega Game Reserve, South Africa [3584x2688]
Valle de la Luna, Chile [OS] [1024x683]
Frosted trees after a snowstorm in Laax, Switzerland [OC] [1072x712]
Let's create code-style names for those images using the 'camelize' function from [underscore.string] (https://github.com/epeli/underscore.string).
cat earthporn.json | underscore select '.data .title' | underscore map 'camelize(value.replace(/\[.*\]/g,"")).replace(/[^a-zA-Z]/g,"")' --outfmt text
Which prints ...
FjarrgljfurCanyonIceland
NewTownEdinburghScotland
SunriseInBryceCanyonUT
KariegaGameReserveSouthAfrica
ValleDeLaLunaChile
FrostedTreesAfterASnowstormInLaaxSwitzerland
Try doing THAT with any other CLI one-liner!
### Version-bump in package.json
This one is straight out of our own [Makefile](https://github.com/ddopson/underscore-cli/blob/master/Makefile):
underscore -i package.json process 'vv=data.version.split("."); vv[2]++; data.version=vv.join("."); data;' -o package.json
### Getting a greppable list of URLs fetched during the load of a website
This is one I did at work the other day. Chrome --> Dev Console (CMD-OPT-J) --> Network Tab --> (right click context menu) --> Save All as HAR. I have no idea why it's called a "HAR" file, but it's pure JSON data ... pretty verbose stuff, but I just want the urls ...
cat site.har | underscore select '.url' --outfmt text | grep mydomain > urls.txt
Well, I'd also like to [ack](http://betterthangrep.com) through the contents of all those files. Best to get a local snapshot of it all:
cat urls.txt | while read line; do curl $line > $(echo $line | perl -pe 's/https?://([^?]*)[?]?.*/$1'); done
And I'm off to the races analyzing the behavior and load ordering of a complex production site that dynamically loads (literally) hundreds of individual resources off the network. Sure, I could have viewed all that stuff inside Chrome, but I wanted a local directory-structured snapshot that I could serve on a local Nginx instance by adding entries in /etc/hosts that mapped the production domains to 127.0.0.1. Now I can run the exact production site locally, make changes, and see what they would do.
Look at [Examples.md](https://github.com/ddopson/underscore-cli/blob/master/Examples.md) for a more comprehensive list of examples.
# Polish: 1001 Little Conveniences
This section is intended to capture all the places where I spent a great deal of effort to get the best possible behavior on something subtle. aka "polish". It also captures some of the intended "best behaviors" that I haven't had cycles to implement yet.
### Templates as first class NPM modules - ie, real stack traces
When using the 'template' command, we go to great length to provide a fully debuggable experience. We have a custom version of the template compilation code (templates are compiled to JS and then evaluated) that ensures a 1:1 mapping between line numbers in the original *.template file and line numbers in the generated JS code. This code is then loaded as if it were a real Node.js module (literally, using a require() statement). This means that should anything go wrong, the resulting stack traces and sytax exceptions will have correct line numbers from the original template file.
### Expressions auto-return the last value
This one is a bit [CoffeeScript](http://coffee-script.org) inspired. When we parse command-line expressions for commands like 'map', they are evaluated as NodeScript objects. This allows us to retrieve the last value in the expression. In a previous version we wrapped expressions in function boilerplate; however this blocked the use of semicolons within an expression. With first class Script objects, we can evaluate multiple semicolon delimited expressions and still capture the value from the last expression evaluated. Thus, all of the following expressions will return "10".
underscore run '5 + 5'
underscore run 'x=5; y=5; x+y;'
underscore run 'x=5, y=5, x+y;'
This even works to find the last evaluated value inside conditional branches (these also return 10):
underscore run 'x=5; if (x > 0) { 10; } else { 0; }' # last value is 10
underscore run 'x=5; if (x > 0) { y=5; } else { y=-99; } x+y;' # last value is 'x+y'
In general, the principle here is that the code should just return what you intuitively expect without requiring much thought.
### Autodetection of CoffeeScript
If you type a CoffeeScript expression and forget to use the '--coffee' flag, Underscore-CLI will first attempt to parse it as JavaScript, and if that fails, parse it as CoffeeScript.
However, a warning is emitted:
"Warning: Parsing user expression 'foo?.bar?.baz' as CoffeeScript. Use '--coffee' to be more explicit."
Why do we print a warning? Unfortunately, there are a number of language features that are ambiguous between JS and Coffee. ie, expressions that are valid in both languages but with different meaning. For example:
test ? 10 : 20; // JS: if test is true, then 10, else 20
test ? 10 : 20; // Coffee: if test is true, then test, else {10: 20}. Tragic.
### Lazy loading of CoffeeScript library
Loading the 'coffee-script' npm module takes 50+ ms. JSONSelect is another 5ms. That may not sound like much time, but it's the difference between 153 ms and 93 ms, and 153 ms is definitely human perceivable. It will also make a difference if you are writing a quick-and-dirty bash loop that executes underscore-CLI repeatedly. Plus, fast just feels good.
A few more notes... Node.js takes about 33 ms to run "hello world", and 45ms if you either "require('fs')" or 'require' anything that's not pre-compiled into the node executable (pretty hard to avoid that). Adding underscore, underscore.string and a few of node's pre-compiled modules, and basic code loading takes ~60 ms. That leaves ~33 ms spent on actually running code that initializes the command list and decides what to do with the command-line args that were passed in.
### Smart whitespace in output
As mentioned above, dense JSON is nearly unreadable to human beings, so we want to pretty print it. JSON.stringify will accept an 'indentation' parameter that does make JSON much more readible; however, this will put _everything_ on a new line resulting in output that is silly verbose -- printing "[1, 2, 3, 4]" will take up 6 lines despite having only 12ish characters. Node's "util.inspect" is a bit better, but it doesn't print valid JSON (eg, inspect uses single instead of double quotes). I don't want to compromize on JSON compatibility just to get pretty output. So I wrote my own formatter that gives the best of both worlds. The default output format is strictly JSON compatible, human readible, yet avoids excessive verbosity by putting small objects and arrays on a single line where possible. The formatting code is also pretty flexible, allowing me to support colorization and a bunch of other nifty features; at some point, I may break the formatter into it's own npm module.
### Smart auto-consumption of STDIN
TBI - as of this version, if there is no data, we will block for reading STDIN. We should only do this if the user expression refers to the well-known 'data' variable. This would unify the 'process' and 'run' commands.
### Smart auto-detection of return value
TBI - as of this version, the last evaluated expression value is always returned. However, sometimes, you want to mutate the existing data instead of returning a new value. This should be easy. If the expression does something like 'data.key = value', then the return value should be 'data'. Today, you have to write 'data.key = value; data'. I want that last part to be implicit, but only if you mutate the data variable. And there should be a command-line flag "--retval={expr,data,auto}", with 'auto' being the default.
### Efficienct stream processing for set-oriented commands like 'map'
TBI - as of this version, all commands slurp the entire input stream and parse it before doing any data manipulation. This works fine for the vast majority of scenarios, but if you actually had a 30GB JSON file, it would be a bit clunky. For set-oriented commands like 'map', a smarter core engine plus a smarter JSON parser could enable stream-oriented processing where data processing occurs continuously as the input is read and streamed to the output without ever needing to store the entire dataset in memory at once. This feature requires a custom JSON-parser and some serious fancy, but I'll get to it eventually. If you have any performance-sensitive use-cases, post an issue on Github, and I'd be glad to work with you.
# Alternatives
* [jsonpipe] (https://github.com/dvxhouse/jsonpipe) - Python focused, w/ a featureset centered around a single scenario
* [jshon] (http://kmkeen.com/jshon/) - Has a lot of functions, but very terse
* [json-command] (https://github.com/zpoley/json-command) - very limited
* [TickTick] (https://github.com/kristopolous/TickTick) - Bash focused JSON manipulation. Iteresting w/ heavy Bash integration. Complements this tool.
* [json] (https://github.com/trentm/json) - Similar idea.
* [jsawk] (https://github.com/micha/jsawk) - Similar idea. Uses a custom JS environment. Good technical documentation.
* [jsonpath] (http://code.google.com/p/jsonpath/wiki/Javascript) - this is not a CLI tool. It's a runtime JS library.
* [json:select()] (http://jsonselect.org/#tryit) - this is not a CLI tool. CSS-like selectors for JSON. Very interesting idea.... now available as an Underscore-CLI command.
* [RecordStream] (https://github.com/benbernard/RecordStream) - PERL based tool intended to process JSON "rows". I need to add similar multi-record support to underscore-CLI.
Please add a Github issue if I've missed any.