# cluster_chef

**Repository Path**: mirrors_DiUS/cluster_chef

## Basic Information

- **Project Name**: cluster_chef
- **Description**: Big Data Compute Cluster using Chef, Hadoop and Cassandra in the Amazon AWS/EC2 cloud
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-10-22
- **Last Updated**: 2026-05-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


h1. "Version 3 alpha":https://github.com/infochimps/cluster_chef/tree/version_3 now available!

See the "version_3 branch":https://github.com/infochimps/cluster_chef/tree/version_3 for a major upgrade.

If you are a new or prospective adopter of cluster_chef we strongly recommend you begin with the version_3 branch. If you are an existing user, you should hold back until the official release. Our firm target release date is Dec 14.

h3. Version 3 Improvements:

h4. @knife cluster@ toolset

* from @howech, cluster files directly manipulate their cluster/facet role.
* cluster files are now dramatically cleaner and yet more powerful.
* cluster_chef is now a gem (thanks @arp!) -- no more clunky symlinking to get started
* node discovery and synchronization of cloud, cluster and server are both rock-solid
* ec2 cloud now supports EBS volume creation, attachment and tagging; elastic IP association; keypair creation

h4. New Superpowers

* Rundeck integration -- temujin9 can lift small cities with his mind now. Ridiculously awesome.
* (initial) support for Vagrant/Virtualbox -- test all your clusters on your laptop, deploy them to the cloud.

h4. Cookbook cleanup

* all cookbooks have uniform attribute names, directory structure, and shared conventions -- @log_dir@ is always spelled @log_dir@, is always referred to via node attribute (and never sometimes-hardcoded-sometimes-configurable), and always handled uniformly.
* cookbooks are assertive and (adaptably) opinionated: a @conf_dir@ is by default be in @/etc/{service_name}@, with permissions @root:root 755@, and all config files live there.
* cookbooks are entirely free of references
* all cookbooks bear complete and accurate @metadata.rb@ and @README.md@ files

h4. Aspects / Discovery and Integration Cookbooks

The most important changes are the development of **aspects** and **integration cookbooks**. Cookbooks repeatably express these and other aspects:
* "I launch these daemons: ..."
* "I haz a bukkit, itz naem '/var/log/lol'"
* "I have a dashboard at 'http://....:...'"
* ... and much more.

Wouldn't it be nice if announcing a log directory caused...
  - my log rotation system to start rotating my logs?
  - a 'disk free space' gauge to be added to the monitoring dashboard for that service?
  - flume (or whatever) began picking up my logs and archiving them to a predictable location?
  - in the case of standard apache logs, a listener to start counting the rate of requests, 200s, 404s and so forth?
Similarly, announcing ports should mean
  - the firewall and security groups configure themselves correspondingly
  - the monitor system starts regularly pinging the port for uptime and latency 
  - and pings the interfaces that it should *not* appear on to ensure the firewall is in place?

The version_3 cookbooks make those aspects standardized and predictable, and provides integration and discovery hooks. The key is to make integration *inevitable*: No more forgetting to rotate or monitor a service, or having a config change over here screw up a dependent system over there.

Joe Bob says check it out.

------------------------------------------------------

h1. cluster_chef

Chef is a powerful tool for maintaining and describing the software and configurations that let a machine provide its services.

cluster_chef is

* a clean, expressive way to describe how machines and roles are assembled into a working cluster.
* Our collection of Industrial-strength, cloud-ready recipes for Hadoop, Cassandra, HBase, Elasticsearch and more.
* a set of conventions and helpers that make provisioning cloud machines easier.

h2. Walkthrough

Here's a basic 3-node hadoop cluster:

<pre>
    ClusterChef.cluster 'demohadoop' do
      merge!('defaults')
      
      facet 'master' do
        instances           1
        role                "nfs_server"
        role                "hadoop_master"
        role                "hadoop_worker"
        role                "hadoop_initial_bootstrap"
      end

      facet 'worker' do
        instances           2
        role                "nfs_client"
        role                "hadoop_worker"
        server 0 do
	  chef_node_name 'demohadoop_worker_zero'
        end 
      end

      cloud :ec2 do
        image_name          "lucid"
        flavor              "c1.medium"
        availability_zones  ['us-east-1d']
        security_group :logmuncher do
          authorize_group "webnode"
        end
      end
      
    end
</pre>

This defines a *cluster* (group of machines that serve some common purpose) with two *facets*, or unique configurations of machines within the cluster. (For another example, a webserver farm might have a loadbalancer facet, a database facet, and a webnode facet).

In the example above, the master serves out a home directory over NFS, and runs the processes that distribute jobs to hadoop workers. In this small cluster, the master also has workers itself, and a utility role that helps initialize it out of the box.

There are 2 workers; they use the home directory served out by the master, and run the hadoop worker processes. 

Lastly, we define what machines to use for this cluster. Instead of having to look up and type in an image ID, we just say we want the Ubuntu 'Lucid' distribution on a c1.medium machine. Cluster_chef understands that this means we need the 32-bit image in the us-east-1 region, and makes the cloud instance request accordingly. It also creates a 'logmunchers' security group, opening it so all the 'webnode' machines can push their server logs onto the HDFS.

The following commands launch each machine, and once ready, ssh in to install chef and converge all its software.

<pre>
knife cluster launch demohadoop master --bootstrap
knife cluster launch demohadoop worker --bootstrap
</pre>

You can also now launch the entire cluster at once with the following

<pre>
knife cluster launch demohadoop --bootstrap
</pre>

The cluster launch operation is idempotent: nodes that are running won't be started!

h2. Philosophy

Some general principles of how we use chef.

* *Chef server is never the repository of truth* -- it only mirrors the truth.
  - a file is tangible and immediate to access
* Specifically, we want truth to live in the git repo, and be enforced by the chef server. *There is no truth but git, and chef is its messenger*.
  - this means that everything is versioned, documented and exchangeable.
* *Systems, services and significant modifications cluster should be obvious from the `clusters` file*.  I don't want to have to bounce around nine different files to find out which thing installed a redis:server.
  - basically, the existence of anything that opens a port should be obvious when I look at the cluster file.
* *Roles define systems, clusters assemble systems into a machine*.
  - For example, a resque worker queue has a redis, a webserver and some config files -- your cluster should invoke a @whatever_queue@ role, and the @whatever_queue@ role should include recipes for the component services.
  - the existence of anything that opens a port _or_ runs as a service should be obvious when I look at the roles file.
* *include_recipe considered harmful* Do NOT use include_recipe for anything that a) provides a service, b) launches a daemon or c) is interesting in any way. (so: @include_recipe java@ yes; @include_recipe iptables@ no.) You should note the dependency in the metadata.rb. This seems weird, but the breaking behavior is purposeful: it makes you explicitly state all dependencies.
* It's nice when *machines are in full control of their destiny*.
  - initial setup (elastic IP, attaching a drive) is often best enforced externally
  - but machines should be ablt independently assert things like load balancer registration that that might change at any point in the lifetime.
* It's even nicer, though, to have *full idempotency from the command line*: I can at any time push truth from the git repo to the chef server and know that it will take hold.


---------------------------------------------------------------------------

h2. Getting Started

This assumes you have installed chef, have a working chef server, and have an AWS account. If you can run knife and use the web browser to see your EC2 console, you can start here. If not, see the instructions below.

h3. Setup

Install these gems (you may have several installed already):

<pre>
sudo gem install chef fog highline configliere net-ssh-multi formatador terminal-table gorillib extlib
</pre>

(and maybe right_aws and broham?)

h4. Add cluster_chef to your cookbooks repo

This assumes the same setup as described in the knife quickstart walkthrough, but to keep things convenient we'll use these shortcuts to refer

<pre>
CHEF_REPO_DIR=$HOME/PATH/TO/chef-repo      # has cookbooks/ and site-cookbooks/
DOT_CHEF_DIR=$HOME/PATH/TO/chef-repo/.chef # has your knife.rb, USERNAME.pem etc
</pre>

I recommend adding cluster chef as a git submodule; you can instead clone it elsewhere.

<pre>
cd $CHEF_REPO_DIR
git submodule add git@github.com:infochimps/cluster_chef.git cluster_chef
</pre>

h4. Knife setup

In your <code>$DOT_CHEF_DIR/knife.rb</code>, modify the cookbook path (to include cluster_chef/cookbooks and cluster_chef/site-cookbooks) and to add settings for @cluster_chef_path@, @cluster_path@ and @keypair_path@. Here's mine:

<pre>
current_dir = File.dirname(__FILE__)
organization  = 'CHEF_ORGANIZATION'
username      = 'CHEF_USERNAME'

# The full path to your cluster_chef installation
cluster_chef_path File.expand_path("#{current_dir}/../cluster_chef")
# The list of paths holding clusters
cluster_path      [ File.expand_path("#{current_dir}/../clusters") ]
# The directory holding your cloud keypairs
keypair_path      File.expand_path(current_dir)

log_level                :info
log_location             STDOUT
node_name                username
client_key               "#{keypair_path}/#{username}.pem"
validation_client_name   "#{organization}-validator"
validation_key           "#{keypair_path}/#{organization}-validator.pem"
chef_server_url          "https://api.opscode.com/organizations/#{organization}"
cache_type               'BasicFile'
cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )

# The first things have lowest priority (so, site-cookbooks gets to win)
cookbook_path            [
  "#{cluster_chef_path}/cookbooks",
  "#{current_dir}/../cookbooks",
  "#{current_dir}/../site-cookbooks",
]

# If you primarily use AWS cloud services:
knife[:ssh_address_attribute] = 'cloud.public_hostname'
knife[:ssh_user] = 'ubuntu'

# Configure bootstrapping
knife[:bootstrap_runs_chef_client] = true
bootstrap_chef_version   "~> 0.10.0"

# AWS access credentials
knife[:aws_access_key_id]      = "XXXXXXXXXXX"
knife[:aws_secret_access_key]  = "XXXXXXXXXXXXX"

</pre>

h4. Push to chef server

To send all the cookbooks and role to the chef server, visit your cluster_chef directory and run:

<pre>
cd $CHEF_REPO_DIR
mkdir -p $CHEF_REPO_DIR/site-cookbooks
knife cookbook upload --all
for foo in roles/*.rb ; do knife role from file $foo & sleep 1 ; done
</pre>

You should see all the cookbooks defined in cluster_chef/cookbooks (ant, apt, ...) listed among those it uploads.

h4. Cluster chef's Knife plugins

<pre>
mkdir -p $DOT_CHEF_DIR/plugins
ln -s $CHEF_REPO_DIR/cluster_chef/lib/cluster_chef/knife           $DOT_CHEF_DIR/plugins/
ln -s $CHEF_REPO_DIR/cluster_chef/lib/cluster_chef/knife/bootstrap $DOT_CHEF_DIR/
</pre>

If you type @knife cluster launch@ you should see it found the new scripts:

<pre>
    ** CLUSTER COMMANDS **
    knife cluster launch CLUSTER_NAME [FACET_NAME] (options)
    knife cluster show CLUSTER_NAME [FACET_NAME] (options)
</pre>

h3. Your first cluster

Let's create a cluster called 'demosimple'. It's, well, a simple demo cluster.

h4. Create a simple demo cluster

Create a directory for your clusters; copy the demosimple cluster and its associated roles from cluster_chef:

<pre>
mkdir -p $CHEF_REPO_DIR/clusters
cp cluster_chef/clusters/{defaults,demosimple}.rb ./clusters/
cp cluster_chef/roles/{big_package,nfs_*,ssh,base_role,chef_client}.rb  ./roles/
for foo in roles/*.rb ; do knife role from file $foo ; done
</pre>

Symlink in the cookbooks you'll need, and upload them to your chef server:

<pre>
cd $CHEF_REPO_DIR/cookbooks
ln -s ../cluster_chef/site-cookbooks/{nfs,big_package,cluster_chef,cluster_service_discovery,firewall,motd} .
knife cookbook upload nfs big_package cluster_chef cluster_service_discovery firewall motd
</pre>

h4. AWS credentials

Make a cloud keypair, a secure key for communication with Amazon AWS cloud. cluster_chef expects a keypair named after its cluster -- this is a best practice that helps keep your environments distinct.

# Log in to the "AWS console":http://bit.ly/awsconsole and create a new keypair named @demosimple@. Your browser will download the private key file.
# Move the private key file you just downloaded to your .chef dir, and make it private key unsnoopable, or ssh will complain:

<pre>
mv ~/downloads/demosimple.pem $DOT_CHEF_DIR/demosimple.pem
chmod 600 $DOT_CHEF_DIR/*.pem
</pre>

h3. Cluster chef knife commands

h4. knife cluster launch

Hooray! You're ready to launch a cluster:

<pre>
    knife cluster launch demosimple homebase --bootstrap
</pre>

It will kick off a node and then bootstrap it. You'll see it install a whole bunch of things. Yay.

h2. Extended Installation Notes

h3. Set up Knife on your local machine, and a Chef Server in the cloud

If you already have a working chef installation you can skip this section.

To get started with knife and chef, follow the "Chef Quickstart,":http://wiki.opscode.com/display/chef/Quick+Start We use the hosted chef service and are very happy, but there are instructions on the wiki to set up a chef server too. Stop when you get to "Bootstrap the Ubuntu system" -- cluster chef is going to make that much easier.

h3. Cloud setup

If you can use the normal knife bootstrap commands to launch a machine, you can skip this step.

Steps:

* sign up for an AWS account
* Follow the "Knife with AWS quickstart":http://wiki.opscode.com/display/chef/Launch+Cloud+Instances+with+Knife on the opscode wiki.

Right now cluster chef works well with AWS.  If you're interested in modifying it to work with other cloud providers, "see here":https://github.com/infochimps/cluster_chef/issues/28 or get in touch.

h3. Tips and Troubleshooting

h5. Help a git deploy recipe has gone limp

Suppose you are using the @git@ resource to deploy a recipe (@george@ for sake of example). If @/var/chef/cache/revision_deploys/var/www/george@ exists then *nothing* will get deployed, even if /var/www/george/{release_sha} is empty or screwy.  If git deploy is acting up in any way, nuke that cache from orbit -- it's the only way to be sure.

 $ sudo rm -rf /var/www/george/{release_sha} /var/chef/cache/revision_deploys/var/www/george

h5. Hadoop namenode bootstrap

Once the master runs to completion with all daemons started, remove the hadoop_initial_bootstrap recipe from its run_list. (Note that you may have to edit the runlist on the machine itself depending on how you bootstrapped the node).

h5. Problems starting NFS server on ubuntu maverick 

For problems starting NFS server on ubuntu maverick systems, read, understand and then run /tmp/fix_nfs_on_maverick_amis.sh -- See "this thread for more":http://fossplanet.com/f10/[ec2ubuntu]-not-starting-nfs-kernel-daemon-no-support-current-kernel-90948/

h5. Setup for Knife versions before 0.10

On older versions of chef that don't have a plugin mechanism for new commands, we have to do surgery on the knife itself... we'll just symlink the new commands into chef's lib/chef/knife directory, and symlink the bootstrap templates into chef's lib/chef/knife/bootstrap directory. Set the path to your cluster_chef directory and run the following:

<pre>
    sudo ln -s $CLUSTER_CHEF_DIR/lib/cluster_chef/knife/*.rb            $(dirname `gem which chef`)/chef/knife/
    sudo ln -s $CLUSTER_CHEF_DIR/lib/cluster_chef/knife/bootstrap/*     $(dirname `gem which chef`)/chef/knife/bootstrap/
</pre>