Skip navigation.

Scaling Puppet with Git

Scaling Puppet with Git

More and more people are turning to systems automation tools like Puppet and Chef to get the most out of their environments, and to create time to focus on delivering business benefits. Scaling Puppet is most commonly done using client/server mode, in which every client is issued with an SSL certificate, and conversations take place between clients and a server, over HTTP, and manifests and assets are served over the network, and applied by a locally running Puppet daemon. However, is there a better way? We present an alternative to the traditional Puppetmaster solution which we like to call ‘Git Puppet’.

Guest article by Stephen Nelson-Smith

Update: There is a fuller and more up-to-date description of my Git-based Puppet infrastructure in The Puppet Cookbook - so after you read this introduction, you may find it helpful to read the book too.

Drawbacks of puppetmaster / puppetd

Although this is the most popular deployment strategy, there are a few drawbacks to this method.

Fiddly to set up

Firstly, it’s a bit fiddly to set up. A quick glance at the Puppet mailing list, or the irc channel shows a slight tendency for new users to get a bit muddled when setting up the puppetmaster, and sorting out the SSL certificates. It isn’t especially difficult, but it’s not brain-deadeningly, works-automatically simple.

WEBrick doesn’t scale

Puppet, out of the box, uses the WEBrick HTTP server. This is a simple Ruby HTTP library that is used as the internal Ruby-on-Rails webserver, designed primarily for local development work. It doesn’t scale very well - my experience has been that once you hit about thirty clients, it really begins to struggle. It’s also not ever so fast. None of this is a surprise - it’s a development webserver - but it does mean that in order to scale we need to set up a grown-up webserver.

Mongrel / Passenger not ideal (esp. for auto setup)

The most popular approach to scaling Puppet is to use either Mongrel, with a proxy server such as Nginx, or to use Passenger (mod_rails) plugged into Apache or Nginx. Now again, this isn’t especially difficult, but it certainly falls into the category of non-trivial. Again, a quick survey of the mailing list and the IRC channel will indicate that one of the most popular support requests is getting Puppet to play nicely with Passenger. Also, from the perspective of an engineer who wants to automate and simplify their infrastructure, Passenger isn’t an ideal choice - installation requires manual steps, and, last time I looked, on CentOS the preferred installation method was to use either gems or a tarball.

Puppetd memory issue

So much for the server side - what about the client? Well, it’s not all good news here either. Many users have experienced problems with running the native Puppet daemon on their Puppet clients. Things may have improved in the most recent releases, but my experience has been that it exhibits behaviour consistent with that of an application with a memory leak. I ended up running puppetd out of cron, and with a random delay to prevent overloading the puppetmaster by having too many simultaneous connections. Several Puppet experts I know have done similarly.

Multi-site setup

A final drawback is that frequently an organisation’s systems infrastructure is dispersed over a number of physical (and network) locations. It’s not uncommon to find a company with a production network with one hosting provider, a disaster recovery setup with another, a staging environment in their own server room, and a number of local developer networks of VMs, all requiring their configurations to be managed. Quite aside from the volume of machines and connections that this can generate, we also find that some of these networks have fairly conservative firewall policies, and in cases where the distances between sites can be significant, even with the improved performance of the latest Puppet release, administrators start to feel that it makes more sense to run local instances of puppetmasterd, and then have to engineer some kind of way to keep them in sync.

Note that none of these drawbacks are show-stoppers - this is by far the most popular way to deploy Puppet, and all of these problems can be solved. I’m just pointing out that these drawbacks are present, and if we can find a simpler and more elegant solution, we should give it serious consideration.

Our vision/requirements

So - what’s our vision for an improved solution? I think it needs to encompass the following requirements:

  • There should be no need to set up complicated server environments involving third party modules, and proxy servers
  • There should be no fussing about with SSL certificates
  • It should comfortably scale to hundreds of servers without needing to resort to setting up an enterprise web application stack
  • It should be fast

The plan

A useful feature of Puppet is that it ships with a standalone execution script - that is we can write a Puppet manifest, and ‘interpret’ it on the command line, and its intent will be made explicit. Here’s an example:

package { "screen":
  ensure => installed,
}

Simply save this as something.pp and run it with Puppet:

[root@gruyere ~]# puppet -v something.pp 
info: Applying configuration version '1264090058'
notice: //Package[screen]/ensure: created

All of Puppet’s power can be harnessed in this way. A common way to organise Puppet config is into modules - for example, you might have a module called ‘sshd’ which has Puppet install sshd. Modules are a feature, available since 0.22.2, which makes it easy to gather together files, resources, Puppet classes, templates and definitions into a related bundle with a shared namespace. For example we could write a Puppet mysql module that contains the packages, the my.cnf, and defines a service, together with its relationships to the config and package. These modules can be packaged and reused - and, to make this possible, Puppet includes the config directive modulepath. Any config directive can be passed to Puppet on the command line, so we can include our modules like this:

# puppet -v --modulepath=/srv/puppet/modules

We can then put any modules in /srv/puppet/modules, and Puppet will be able to use them.

Best practice is to keep Puppet manifests (code) in a version control system. Given that it is possible to execute Puppet manifests locally with the standalone script, can we combine these two? Well, yes - we can. We keep the Puppet config in a version control system, check it out on the local machine, and run Puppet against it locally.

In 2010 you have two major choices for version control - do you go for a traditional, central repository such as Subversion, or even CVS? Or do you go for a distributed version control such as Git, Mercurial or Bazaar?

Well, let’s look at our requirements list. Both Subversion and Git are very simple to set up - Git slightly simpler. There’s no need for SSL certificates - we can run everything over SSH, if we decide we want encryption. Git is definitely faster than Subversion in a lot of ways (by design, all commits are local), but I’ve not done a test to quantify it. So at this stage, either would do the trick.

However, let’s think a bit more about the environment in which we’re going to work. We’re going to have a bunch of servers whose config we’re going to manage via Puppet. Wouldn’t it be really handy to be able to manage this via push rather than pull? That leaves us feeling more in control, and we can rate-limit clients hitting any shared resource. From a central hub we could say: ok - push this config out to all staging servers. Or all webservers. Or just one server. It makes sense from a security perspective too - the central Puppet server is a valuable and vulnerable machine. If we push out to servers, we need only allow SSH out - we don’t need to open port 8140 to a DMZ or public hosts.

A further consideration is the possibiity that we might have a number of sysadmins or developers (or devops) working on Puppet manifests. It would be awesome to make it very very easy for them to branch, play, test, pull in stuff from the main production repo, and share independently with each other. This is exactly the sort of situation for which Git, and other distributed version control systems were built.

So - let’s set up Git repos in such a way as to permit us to push to each machine or groups of machines (which I will call ‘spokes’) from a central machine (which I will henceforth refer to as the ‘hub’).

Bootstrap the Git-Puppet infrastructure

Let’s start on your workstation, where we will start writing Puppet manifests. You’re going to need to install Git. Packages are available for most OS and distros - I won’t attempt to second-guess your environment. My workstation is a MacBook Pro, with a CentOS VM providing userland tools. Pick your poison. So, install Git, create a simple directory structure for the Puppet manifests, and then create a Git repo:

# mkdir puppet/manifests puppet/modules
# cd puppet
# git init

We also need an SSH key for Git to use:

# ssh-keygen -f git.rsa

Install Puppet and Git

OK, we’re going to need to install Puppet and Git on all our machines. Most of my clients run CentOS or Redhat, so I provide a small tarball which contains the latest Puppet RPMs, and its dependencies. I then do a yum local install of these, which provides me with the local Puppet interpreter. I then write (or simply download) a bootstrap Puppet manifest to provide a custom repo, a user, key based authentication and a bare Git repo.

Here’s an example:

user { "git":
  ensure => "present",
  home => "/var/git",
}

file {
  "/var/git": ensure => directory, owner => git, 
  require => User["git"];
  "/var/git/puppet": ensure => directory, owner => git, 
  require => [User["git"], File["/var/git"]],
}

ssh_authorized_key { "git":
  ensure => present,
  key => "INSERT PUBLIC KEY HERE",
  name => "git@atalanta-systems.com",
  target => "/var/git/.ssh/authorized_keys",
  type => rsa,
  require => File["/var/git"],
}

yumrepo { "example":
  baseurl => "http://packages.example.com",
  descr => "Example Package Repository",
  enabled => 1,
  gpgcheck => 0,
  name => "example",
}

package { "git":
  ensure => installed,
  require => Yumrepo["example"],
}

exec { "Create puppet Git repo":
  cwd => "/var/git/puppet",
  user => "git",
  command => "/usr/bin/git init --bare",
  creates => "/var/git/puppet/HEAD",
  require => [File["/var/git/puppet"], Package["git"], User["git"]],
}

Copy this to every machine, install Puppet and run puppet -v this_file.pp. Puppet’s ‘file’ type will set up a directory to allow SSH copy of the manifests via the git user. (Note that this example manifest doesn’t show how to manage a Unix group using Puppet - we’ll assume here that the user provider automatically creates the required group.)

I suggest installing the private key on your workstation and on your hub server - this allows you to connect over SSH to all your spoke machines, as the Git user.

Note that if you get an error “Puppet ssh_authorized_key no such file or directory /home .ssh” you may need to make sure that the .ssh directory is created within git’s home directory.

So, now we need to push a copy of our Puppet code from our workstation to the hub:

# git remote add hub ssh://git@hubserver/var/git/puppet

Now let’s create some content - I recommend creating a module such as sudo:

# cd puppet/modules
# mkdir -p sudo/manifests sudo/files

In the modules/sudo/manifests directory, create an init.pp which simply contains the line:

import "*"

This will import any other files in the directory.

Now let’s have an install.pp:

# cat <<EOF > sudo/manifests/install.pp
class sudo::install {
  package{ "sudo":
    ensure => installed,
}
EOF

and also a sudoers.pp:

class sudo::sudoers {

  file { "/tmp/sudoers":
    mode => 440,
    source => "puppet:///modules/sudo/sudoers",
    notify => Exec["check-sudoers"],
  }

  exec { "check-sudoers":
    command => "/usr/sbin/visudo -cf /tmp/sudoers && cp /tmp/sudoers /etc/sudoers",
    refreshonly => true,
  }

}

Now copy your /etc/sudoers file (with or without modification - this is for illustrative purposes only) to files/sudoers.

Great - now we have the elements of managing sudoers. We just need to create a node definition to include it:

# cd puppet/manifests
# mkdir nodes

# cat <<EOF > nodes/yarg.pp
yarg {
  include sudo::install
  include sudo::sudoers
}
EOF

Finally, we need a site.pp:

import "nodes/*.pp"

Add it:

# cd puppet
# git add . 
# git commit -m "Initial import"

Now, use ssh-agent to stash the ssh key:

# ssh-agent bash
# ssh-add /path/to/git.rsa
# git push hub master

So now we have a central repo on our hub server. We can clone this to any other workstations as required, and push updates from our local repos to the Puppet hub.

We now want to be able to push out Puppet config from the hub.

The principle is very simple - we simply add remotes to the repo on the hub. For example:

# git remote add web1 ssh://git@web1.mydomain.com/var/git/puppet

Where it gets clever is that we can specify multiple URLs for each remote. In the Git config (/var/git/puppet/config in our example), we just add a stanza such as the following:

[remote "webs"]
        url = ssh://git@matisse/var/git/puppet
        url = ssh://git@picasso/var/git/puppet
        url = ssh://git@klee/var/git/puppet

Now, provided we stash the ssh key using ssh-agent, or craft an SSH config file which specifies it, we can push from the hub to any machine, or any group of machines.

We now need to arrange for the Puppet code to appear on the spoke machines. To achieve this we use a Git post-receive hook:

#!/bin/sh
git archive --format=tar HEAD | (cd /etc/puppet && tar xf -)

Replace the default post-receive hook in /var/git/puppet/hooks/post-receive with the above, and chmod +x it.

We could have the post-receive hook actually run Puppet if we wanted, or we could run Puppet out of cron, or we could run Puppet manually - I leave the choice up to the reader. Either way, you’ll need to point Puppet to the path where your modules have been pushed, for example:

puppet --modulepath=/etc/puppet/modules

Ability to pull

Although our push model seems effective and, in many ways, ideal, it would be good to open the possibility of being able to pull from one of the spoke servers. The kind of situation I’m imagining here would be a developer who has a handful of test VMs on their workstation, which they want to be built with Puppet, but which are out of the reach of the Puppet server.

There are a couple of ways to achieve this. From this firewalled-off machine, we can clone the repo, as long as we have access to any of the machines to which we push Puppet, including the hub. We could set up an SSH key, and allow SSH from the IP to which the VM is NAT’d. We could fire up the git-daemon on any machine with a repo, and clone it that way. Or we could run a webserver on any of the machines, with a virtual host pointing to the Git repo. The only question to consider is how to tighten up security enough to allow the NAT’d IP access to the Puppet source code. Perhaps the simplest approach, if one of the devops folk has a laptop or workstation on the same network, and has their own copy of the repo, is to git —bare clone their repo, and serve it via git-daemon temporarily. There are many ways to skin this cat!

Further benefits

A full discussion of the merits of distributed version control is clearly outside the remit of this article, but having a Puppet infrastructure powered by Git opens up a whole world of collaborative development opportunities. One particularly attractive possibility is the easy ability to move to a branch to carry out test-driven Puppet work. We can easily push a branch (or a tag) to a test machine, verify that it works, and merge it back into the main production branch, without any headache at all. We could build a simple system around, for example, Rake and cucumber-puppet, to move towards a genuine test-first approach. cucumber-puppet is a tool for behavioral testing of Puppet manifests.

A further idea would be to configure the post-receive hook to behave differently based upon the branch or tag to which the code was pushed. The post-receive hook has access to the ref-name, which will have the form ref/heads/branchname (or tagname). This would make it possible to have tags or branches correlating to push-only, run in dry-run mode, or make-live.

One final idea that is worthy of merit, would be to have the whole infrastructure managed by Hudson, giving us a simple GUI to drive our Puppet deployment.

Git Puppet: conclusion

Moving away, conceptually, from running Puppet in client server mode frees us up to deliver a Puppet environment which is faster, more scaleable, and opens up interesting creative opportunities. Linking this into a distributed version control system seems to be a comfortable fit. I’m running a number of environments in this fashion, and recommend you give it a go.

Further reading

About the author

Stephen Nelson-Smith is an e-business consultant and Technical Manager, with a strong background in Linux, UNIX, Python and Ruby. He is currently one of the technical leads and manages the operations team for the UK Government’s National Strategies website - one of the largest Drupal deployments in Europe. He specialises in system monitoring and automation, and delivering value by mentoring and training in lean and agile principles. You can read more about his ideas at http://agilesysadmin.net.

Different node configuration?

How to handle specific configuration for different nodes using this way?

For example:
* webserver nodes
* mail nodes

Re: Different node configuration?

I’m sure Stephen will answer this more fully, but thanks for the question - the configuration that Puppet applies to each node is determined by its name, so if node ‘foo’ is a webserver, then your node declaration in Puppet might be something like:

node foo {
  include webserver
}

where webserver is a class defined somewhere in your manifest which contains all the configuration for Puppet to apply to the webserver. And so on for other types of machines. Puppet can match several node names in one declaration, like this:

node foo, bar, baz {
  include webserver
}

or with a regular expression:

node /www/ {
  include webserver
}

With Stephen’s git setup, each node receives a copy of the complete Puppet manifests for all nodes, and the Puppet process running locally will determine which configuration should be applied, rather than the Puppetmaster as with a conventional infrastructure.

Follow-up question

Thank you for this writeup and for the extra clarification. There is however one thing I still do not quite understand.

When you say “and the Puppet process running locally will determine which configuration should be applied” in your clarifying comment, how does that actually happen? I see a straight call to puppet with a modulepath pointing att all the definitions. I’m assuming that the file site.pp will be processed, which includes all nodes under nodes/ as import “nodes/*.pp”.

Now, with all the nodes imported I would assume that *every* node definition would be applied. This assumption may or may not be correct - I suspect I may be missing some inner detail to puppet.

But - if the assumption is correct - is there a preferred way to handle the case where the server wants to only instantiate one specific node definition? If I start a webserver, for which there is a node called webserver.pp, what would be the best way to ensure only webserver.pp is handled and not any other files under nodes/ ?

Perhaps the server could be told from the beginning what type of node, or what types of nodes, it should run. If so, I’d be curious to know if there is any preferred way to accomplish it.

Thank you
Marcus Pemer

Marcus, You are right that

Marcus,

You are right that Puppet imports all the node definitions, but in order to know which one to apply, it looks at the hostname of the machine it’s running on. So if the hostname is ‘mybox’, for example, it will search for a node definition ‘node mybox’.

You can override this with the —fqdn switch. For example:

puppet apply --modulepath=/etc/puppet/modules /etc/puppet/manifests/site.pp --fqdn otherbox

storeconfigs

A very appealing idea.

I suppose this makes storeconfigs useless though? AFAICT, the storeconfigs information only gets put into the database when a node’s manifests are compiled, which only happens when the client contacts puppetmasterd, which then requires a central server again?

Re: storeconfigs

I believe you’re right. But if you use this setup, you probably don’t need storeconfigs.

Re: storeconfigs

I was thinking of NAGIOS host and service exports… would there be a different way to automagically import other hosts’ service definitions into the NAGIOS host’s config?

Storeconfigs...

Storeconfigs is very useful for monitoring as well as other items like pull-based backups, automatically populating DNS with host names, and so on.

If there’s no way to re-create this then a very powerful facility of Puppet is lost by using this method. And without some return path from all the clients by doing something like checking in a generated puppet manifest for the server to use there’s not going to be any communication back?.. Sounds pretty hacky to try and get something working for this :/

if you’re using

if you’re using mysql/postgres as storage backend, probably all you need is a central database server.

Still needs a puppetmaster...

Don’t you still need a puppet server though, as that is what does the storing and filtering of the stored configs? Running puppet locally without a central server, whether you have a central database server or not, won’t get you this.

you're right

it would be possible, however, to use a local puppetmaster on each of the machines…

Unclear bit

This article is great, and actually describes the scalable, on-demand push configuration I looked for.

My main question, is what exactly Puppet functionality that is lost with such approach?

Also, how it’s possible to track the deployment progress on the puppet-managed machines, for issues detection?
The puppet has reporting, which again requires the puppet server running.

Perhaps the Hudson idea can solve it - each deployment task will be similar to unit test?

Nice logic for controlling

Nice logic for controlling puppet and making in simple to deploy.

Setup problem

Hi,

I am trying to perform the same setup as detailed in the document. However i am facing certain issues.

- 3 machines LocalMC - Hub - Client
- I was able to setup the LocalMC to Hub configuration by commiting the code to the Hub m/c.
- From the Hub machine in /var/git/puppet directory i do “git push webs master” the file only gets copied to the target Client machine however no exectution happens as per the example mentioned above. In the example give above. You have mentioned yargs as the target m/c. I did modify this according to my client hostname. However no luck.
- Also do i exectute the push via the git user?

I would like some help regarding the setup

Thanks
 Kevin

Git hub set up

Ok, I am a bit of a git newb, and am much more familiar with subversion. My question is what do you have to put in your .git/config to have the hub server auto push to your spokes?

I have everything set up and working when pushing from the developer tree on a workstation to the central hub. I have everything set up and working when pushing from the hub server to the spoke puppet servers. I would like to cut out a step and have the hub automatically push to the spokes.

Would another hook be required on the hub-server, something like ‘git push web master’? Or is there something you can place in .config to auto push any hub updates to the spokes?

To answer my own question for

To answer my own question for anyone else wondering…

You can put a ‘post-update’ hook on your hub server to auto push to the spoke servers.

Assuming you followed this guide and used ‘web’ as the name for your remotes, you can put this in .git/hooks/post-update.

#!/bin/sh
git push web master

Configuration Problem help!!!

Hi,

I followed the steps given above, however i have few queries with the setup.

- Local Machine the commit from /etc/puppet
- Hub machine the code lies in /var/git/puppet/
- Spoke machine the code lies in /var/git/puppet/

The final steps for deploying post-receive hook doesn’t work and gives an error

#!/bin/sh
git archive —format=tar HEAD | (cd /etc/puppet && tar xf -)

tar: manifests/nodes: Cannot mkdir: Permission denied
tar: manifests/nodes/yarg.pp: Cannot open: No such file or directory
tar: manifests/site.pp: Cannot open: File exists
tar: manifests: Cannot utime: Operation not permitted
tar: modules/files: Cannot mkdir: Permission denied
tar: modules/files/sudoers: Cannot open: No such file or directory
tar: modules/sudo: Cannot mkdir: Permission denied
tar: modules/sudo/files: Cannot mkdir: No such file or directory
tar: modules/sudo/files/sudoers: Cannot open: No such file or directory
tar: modules/sudo/manifests: Cannot mkdir: No such file or directory
tar: modules/sudo/manifests/init.pp: Cannot open: No such file or directory
tar: modules/sudo/manifests/install.pp: Cannot open: No such file or directory
tar: modules/sudo/manifests/sudoers.pp: Cannot open: No such file or directory
tar: modules: Cannot utime: Operation not permitted

I had a query where does the post-receive have to be changed on the “Hub” or on the “Spoke”

Can you point me what am i doing wrong? Also how does it excute the manifest? The last step with regards to the puppet module path mentioned.

Regards,
 Kevin

copying ssh private keys is a mistake

Nice write up — thanks :-)

One minor nit to pick — there’s never any excuse for copying ssh private keys around the place. You should have a separate key for every user/machine pair from where someone might control the system, and a different one on the central server. This allows for the one key to be compromised without nearly as much pain. It also allows one to do things like restrict the usage of keys so that an attempt to use the server key from a laptop on the WiFi network won’t get you anywhere, say.

It also occurs to me that the setup you are suggesting allows an attacker to gain control of the whole network by owning the least protected machine, and pushing changes into the master. One could address that by requiring that commits be GPG signed before they are deployed — this would also allow one to have a setup where a junior sysadmin pushes changes in, but they are only deployed more widely once they’ve been signed off by a more senior admin. The more I think about this, the more essential it seems to me — which is a shame, as it means I’m going to need to explain GPG usage to some Windows users if I’m going to deploy it in this way.

Cheers, Phil.

You do not have to explain

You do not have to explain GPG usage. Digital signing is much more straightforward. In fact, any person who needs these things explained to them should not be pushing configurations around.

Puppet-sync

Hello Folks,

just wanted to let you know, I wrote a nice ruby script which should come in handy to ease the workflow of committing and syncing to the puppet master. Check out my blog post about it here:

http://sts.ono.at/blog/2010/12/22/synchronize-puppet-with-git/

Very late to the party here,

Very late to the party here, obviously, but thanks for the write-up.

I’m having a very hard time understanding the real-world benefits of all of the extra complexity for the sake of git “push to remotes” vs. pull via cron from each node. It’s suggested that it “makes us feel more in control”, but I find that to be bunk. In the pull scheme, you have the same control: You develop and test a working setup, then PUSH it to the single repo with a “production” tag (which is pulled from the cron job). It’s not a matter of simplistic preference — one solution is far more simple to configure and administer than the other from all I can tell.

I would love to hear from others on the topic about anything I am missing here in my thought.

RE: Late

Right:

Why DVCS?

We have a directory of files, which, as time goes on will change. We wish to distribute this directory, possibly with signifcant changes to a subset or full set of our machines. Also, perhaps we would like to distribute the administration of this directory, so that others can suggest various improvements which we may, or may not incorporate.

That’s the DVCS use case. Git is popular, git branches are quick, git itself is quick. Git supports ssh and https out of the box. Most distros have ssh installed, libcloud and friends provide access to cloud servers, many of which have ssh installed.

Unix:

Do one thing and do it well”
Puppetmaster is a fileserver. It passes certs around, to complicate things, really, https would serve here. Of course ssh keys are ideal. It is exceptionally well written but still suffers from performance issues, so one is advised to install passenger and rack.

I don’t much care for ruby, I don’t use apache anymore and I don’t use rack. I don’t want it on my systems just to push a manifest to a machine and run a command on it. Git is far, far faster, and is by nature distributed. Also, being a coder,although not a ruby coder, I am familiar with git.

Branching: I’d seek to replace the node concept with a branch. there are other ways to do this. @jordansissel comes to mind.

I love puppet as a tool, and I would not want to second guess things, but puppetmaster does not provide compelling reasons for it’s use. Even with the same complexity, git is faster, and only requires git. Obviously I have my own use cases… ;)

I think maybe I wasn’t clear.

I think maybe I wasn’t clear. I get git. I get subverting puppetmasterd for its native performance issues. I’m saying why all the complexity for “git push to remotes” (in this article) when you can just use git in pull mode on the client node.

“The” client node might

The” client node might actually be more like 1,000 client nodes. I’d rather push from one repo to 1,000 remote nodes than login to each one individually and perform pulls from them.

Doing a git pull from cron is

Doing a git pull from cron is another option. In fact I’m doing this in production for four or five hundred nodes right now.

Using cron to pull puppet manifests

I don’t know why someone wouldn’t use cron. With using cron you don’t have to worry if the machine isn’t running and you don’t have to worry so much about firewall rules because it’s pulling through the firewall instead of pushing.

I will be running several thousand puppet clients and it seems the only way to get it to scale the way I want is to have one central git server serving puppet manifests and then have git “mirrors” around the world that are just keeping in sync with the central git server. The puppet clients will sync to the mirrors. In the cron job script I could have it check round trip times for each and go with the fastest one. Combine that with YUM mirrors for packages and we’re probably in good shape.

Do you see any issues with this setup John Arundel? I’m using your books to plan this out.

Sure, that’s pretty much what

Sure, that’s pretty much what I do for most sites.

Re:

I started out with cfengine because puppet had almost no documentation at the time and was not to be found in any system administration reference Recently I circled back and looked at puppet. Puppet has a much cleaner syntax than cfengine. In the cfengine world, you have these phases such as “check packages”, “check files”, and “run these commands”. You have to decide which order to do things, and it starts getting to be a pain if you want to have an action in one phase trigger an action in another phase, such as “if I have to install the httpd package make sure it’s started in runlevel 3” (last I looked, cfengine didn’t have a native command for starting programs, so you have to do it through shell). Puppet on the other hand breaks things down into modules that are divided per application, and has many more commands to handle things that you’d be doing through shell or file editing in cfengine. There are a few of places where (IMHO) cfengine is much better than puppet. One is in file editing. There are many commands within cfengine to handle operations on files, such as “add this line if it doesn’t exist”, or “comment out this pattern if you ever see it”. In puppet you have to hope that there is a plugin/module to handle the service you’re trying to edit, or you have to dive into templating. Secondly, cfengine makes it very easy to classify a server, such as “if you see this file, it’s a NIS server, otherwise it’s a client” (which lets you alter behaviour later on in the configuration). In puppet you have to write and distribute your own facter recipes or do some other fancy stuff. Finally, cfengine has some basic monitoring built in, such as letting you run actions based on disk usage or other system metrics. Puppet just doesn’t have anything that I could find that could compare. Despite all that, I’d suggest going with puppet. It will eventually catch up to the features of cfengine. For the things that are really easy in cfengine but more difficult in puppet you’ll just learn to adapt because you won’t know any better :P

You might look at Puppet’s

You might look at Puppet’s Augeas support for detailed editing of config files.

what's the current status for git puppet

new to puppet and just order your puppet cookbook, and git puppet seems a great idea, what’s latest status for this approach and the community’s feedback

I exactly had lots of struggle for ssl several times.

Yes, several of our clients

Yes, several of our clients have migrated to a Pupetmasterless infrastructure and it’s working great. I just set up Puppet for a new client using Git from scratch and didn’t set up a Puppetmaster at all. Unless you need storeconfigs, it’s an excellent solution.

Great to see puppetmasterless

Great to see puppetmasterless works well, while as a beginner, it is slight difficult to use it at once due to lack of information. Since most of examples rely on agent/master mode, like source protocol, report etc.

It will be nice if you can recommend some place or more blogs for puppetmasterless tips

If adding some hints in your Puppet Cookbook besides normal example using agent/master for this solution will be bonus for me

Anyway, good to see this. And it is just my start for learning puppet

Multirepo SSH: Halts when one push fails

I have been using this multi-push over SSH method with git since I first read about it and have been very happy with it. Today though, I encountered what may be a significant flaw. When one of the remote servers is down, the git push halts and does not attempt any further connections.

From your example, if “matisse” is down, initiating a “git push webs” seems to result in:
ssh: connect to host matisse: Connection refused
fatal: The remote end hung up unexpectedly

And no attempts to push to “picasso” or “klee”. Have you experienced this behavior? If so, have you found a work around?

Excelent post made me buy the book

And the book is just even better. thanks.

It’s kind of laughable that

It’s kind of laughable that you seem to consider using version control a big revelation.

Is this still the best way to scale?

Hi,

Not sure if anyone is still monitoring/replying to this post, but I have some questions about this git method of scaling - and whether some of the perf concerns have been addressed over the last 4 years.

For info - I’m looking at converting an internally developed system using cron’d shell scripts hooked back into Git for updates for puppet. This is over several thousand servers.

1. Return path. There doesn’t seem to be a way of monitoring and reporting on successes, failures and out-of-syncs within an estate. Puppermaster seems to provide this, but these decentralised/isolated islands of puppet means that signals don’t get back. I guess you would use SNMP/HTTP to send back reports, but this doesn’t seem ideal to me. Has anyone implemented a dashboard using this setup?

2. Push performance. If I’ve read this right, the hub notifies the spokes via s git push (multiple remotes) - how does this perform? These are serial operations, so pushing to a large number of servers means it would take some time to complete?

3. git management. Linked to above - as each server is a remote within a configuration file then it seems like a potential problem area. Just managing this, removing old, adding new, checking if server are added, reporting on what servers are available - all through a command line?

I like git, and puppet sound great (and yes, I have the books ;-) ) but centralised reporting is a must for me as I need a way of identifying (and fixing) failurtes and also reporting compliance so that we can see how successful this process is.

Thanks

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
By submitting this form, you accept the Mollom privacy policy.