The Puppet Cookbook takes you beyond the basics to explore the full power of Puppet, showing you in detail how to tackle a variety of real-world problems and applications.
General sysadmin topics.
The ever-amusing Scott Adams with his take on how to boost one’s professional credibility.
“We’re having a problem sending email out of the department.”
“What’s the problem?” I asked.
“We can’t send mail more than 500 miles,” the chairman explained.
I choked on my latte. “Come again?”
“We can’t send mail farther than 500 miles from here,” he repeated.
“A little bit more, actually. Call it 520 miles. But no farther.”
“Um… Email really doesn’t work that way, generally,” I said, trying
to keep panic out of my voice.
More from Agile Testing’s Grig Gheorghiu on setting up Opscode Chef, creating your own cookbook, modifying an existing cookbook, creating a role and adding a client machine to that role.
Excellent introduction to Puppet for cloud management by Jeff Wallace, featuring EC2 and Rackspace Cloud integration, Puppet classes, using Ruby logic in ERB templates, and inheriting node definitions to create identical configurations.
cucumber-puppet is the glue between cucumber and Puppet, allowing you to write behavioural tests, or features as cucumber calls it, for your Puppet manifest.
It’s been said that deploying Java apps is hard for Linux packages, but Puppet makes it very easy. This is only the tip of the iceberg—you can use the same tool to deploy mailservers and databases as well as appservers. It fits in well whether you have 20 machines or thousands. It’s agnostic to cloud vs physical hardware, and plays nicely in all places. The example that follows below is designed to be executed locally, though in a typical deployment, you’ll host it on a central server, called a puppetmaster, and then roll your configuration out to the nodes using puppet.
Video of Stephen Nelson-Smith speaking about Git for sysadmins - how it works, how to use it, how to support it, and how it compares to other version control systems.
Gizzard is a Scala framework that makes it easy to create custom fault-tolerant, distributed databases. At a high level, Gizzard is a middleware networking service that manages partitioning data across arbitrary backend datastores (e.g., SQL databases, Lucene, etc.). Nick Kallen from Twitter Engineering outlines what Gizzard is and how it works.
pigz, which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. pigz was written by Mark Adler, and uses the zlib and pthread libraries. John Allspaw notes that “When you’ve got massive files, this can be a pretty big advantage, especially when you’ve got lots of cores sitting around.”.
Twitter recently switched from an Apache/Mongrel web infrastructure to one based on Unicorn, a high-performance load balancing application server, and Stormcloud, a Unicorn monitoring and management system. Ben Sandofsky and John Adams from Twitter Engineering explain the benefits.
Agile Testing’s Grig Gheorghiu provides a simple, step-by-step tutorial for installing Opscode Chef and getting a Chef client to talk to a Chef server.
Julian Simpson (aka @builddoctor) is a well-known authority on software build and deployment issues, including configuration management, continuous integration, continuous deployment, and testing. His blog covers all of these topics, featuring talks, interviews, links, and patterns and practices.
Build, deployment and infrastructure superdude Julian Simpson has a job site specifically for software build and deployment professionals. Very useful if you’re looking for a position in the industry - or if you want to advertise a vacancy.
Vagrant is a tool for building and distributing virtualized development environments, providing easy to configure, lightweight, reproducible, and portable virtual machines targeted at development environments. Vagrant includes automated virtual machine creation using Oracle’s VirtualBox, automated provisioning of virtual environments using Opscode Chef, port forwarding to the host machine, full SSH access to created environments, shared folders, packaging environments into distributable boxes, and easy teardown and rebuild of your environments.
Poul-Henning Kamp is the author of Varnish, an open-source HTTP accelerator used by Facebook, Slashdot, Wikia and many other sites to speed up slow Web servers. In this article for ACM Queue he describes how he applied his experience as a lead kernel developer on FreeBSD to developing a new, ultra-high-performance algorithm for cache expiry, and discusses how to optimise traditional algorithms for virtual-memory execution environments.
We have been doing various experiments in our ec2 web serving cluster to serve maximum traffic at the minimum costs. I thought our experience will be useful to many other people using ec2.
“Wouldn’t it be nice if we could abstract some of the low-level details of different socket types, connection handling, framing, or even routing? This is exactly where the ZeroMQ (ØMQ/ZMQ) networking library comes in: “it gives you sockets that carry whole messages across various transports like inproc, IPC, TCP, and multicast; you can connect sockets N-to-N with patterns like fanout, pubsub, task distribution, and request-reply”.” Ilya Grigorik explains the architecture of ZeroMQ with some code examples and diagrams.
Applying agile and lean methodologies to system administration.
For me, what is really exciting about DevOps is the notion that software development, infrastructure engineering and operational automation can and should done simultaneously and collaboratively. DevOps doesn’t invalidate ITIL, nor does it mean unbridled application deployment.
The way I see it we already have some great examples of DevOps in action — its just that the term hasn’t been applied to them yet.
Can you remember the last time when you had to apply patches or config file changes to a system. And did you have that fingers crossed feeling? Wouldn’t it be great that you could install a patch and run a series of tests to see if everything behaved the way it should?
I’ve been managing systems teams in an Agile environment for a number of years, and after thought and experimentation, I can recommend using an approach borrowed from Lean systems management, called Kanban.
Here is a detailed example of a fairly typical 2-tier Kanban board, for teams that know the basics of Kanban and are taking their first steps towards implementing it in practice.
Agile training expert Robert Dempsey examines the application of Kanban techniques to system administration work, with a devops flavour.
Technologies and architectures for data storage and retrieval.
Cassandra is a highly scalable NoSQL database solution. Jonathan Ellis demolishes some misconceptions about Cassandra with a look at its replication capabilities, reliability, and interoperability with Hadoop and similar tools.
A review of the various HA solutions available for MySQL.
Handy notes and tips on squeezing more performance out of your MySQL server.
Hive is a data warehouse infrastructure built over Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability to querying and analysis of large data sets stored in Hadoop files.
David Mytton of BoxedIce explains how his MongoDB infrastructure works and gives some tips and tricks based on experience running MongoDB in production for several months.
SQL databases are fundamentally non-scalable, and there is no magical pixie dust that we, or anyone, can sprinkle on them to suddenly make them scale.
In this FutureRuby talk, Ilya Grigorik explores Tokyo Cabinet’s features such as the key-value store, ordered traversal, attribute search, schemaless data structures,indexing, and scripting with Lua.
Deployment of applications, version control, and continuous integration and nightly build resources.
The two most important part of this talk is the observation that Dev, Qa and Operations teams have to slightly blend into each other to achieve deployments at such a velocity, and the fact that they are not afraid to break the website by deploying code from trunk.
How to set up continuous integration testing for Puppet manifests.
It’s important to realize that the tools you use are largely independent of the integration strategy you use. Although many people associate DVCSs with feature branching, they can be used with CI. All you need to do is mark one branch on one repository as the mainline. If everyone pulls and pushes to that every day, then you have a CI mainline. Indeed with a disciplined team, I would usually prefer to use a DVCS on a CI project than a centralized one.
Collaboration and mutual respect between developers and sysadmins.
Deployment is definitely one of the places where the rubber meets the road. In some organizations, deployment of new code can be the single-most stressful and dividing parts of their work.
Devopsdays was a small conference about a couple of emerging themes combining Development and Operations.
Hilarious Monty-Python-mashup video explaining the devops movement.
Ten characteristics of devops-flavoured sysadmin practices, as outlined by Dmitriy Samovskiy, including coding skills, protection of business revenues, a focus on stability and uptime, and distributed or hyper-distributed applications.
If your dev team is fixing servers, they are not programming. If they are not programming, they may not be making you money. Many developers needlessly use versions of software that are not part of the standard OS. Web developers often lack the experience to know the accepted standards. As a result, you may end up with “odd” deployment scenarios which can complicate management.
General resources on infrastructure and network architecture.
Using these tools should help you build more reliable and less noisy cron jobs, which makes your systems more reliable and your pager more quiet.
A “design pattern” is a little bit of knowledge that is worth sharing, and useful for repeating. For example, the design pattern of “a network” or “active-active redundancy” or “RAID-0″ or “helpdesk”. Ever visit an IT shop where there was no way for people to get help? I have. So I explained the design pattern of a “helpdesk” and without having to re-invent the wheel, they were able to implement one.
When planning the storage for a system one thing you need to know above all other information is how that data is going to be accessed. I’m talking about 95% reads 5% writes, 1.2Gb/minute average transfer, highly latency sensitive. You can make some assumptions based on the applications that’ll be accessing the storage, but if you really need to know, the only way to find out is measuring.
Once you know how that data is going to be accessed, you can build or provision its storage accordingly. Knowing how likely the dataset is to grow is also something you need to know, but that’s a luxury we often don’t get. And for the love of performance metrics, don’t forget peak loading and behavior under fault conditions.
Cloud computing and cloud-based infrastructures,
Scalr is an auto-scaling solution for EC2, a bit like RightScale, but cheaper. They give you some nice infrastructure like a MySQL master/slave replication setup, and distributed cron jobs.
We’ve switched to use Amazon EBS for a few months now. Therefore, we have been able to have a good night’s sleep without worrying our database disappearing. Here’s a quick guide.
Automation is the underlying foundation of the revolution (some say disruption) in Information Technology that is currently happening and has been branded as cloud computing.
After you rebundle a running instance to create a new image, you can then run new EC2 instances of that image.
At OpenX we recently completed a large-scale deployment of one of our server farms to Amazon EC2. Here are some lessons learned from that experience.
Heroku is the instant ruby platform. Deploy any ruby app instantly with a simple and familiar git push. Take advantage of advanced features like HTTP caching, memcached, rack middleware, and instant scaling built into every app. Never think about hosting or servers again.
Many people know that Amazon Web Services are one of the big players in the cloud computing business, and especially their Infrastructure as a Service offering EC2 is becoming increasingly popular. Few people know that EC2 is probably one of the biggest Xen installations deployed. But how many know how EC2 actually works and how the underlying architecture is constructed?
Resilience, uptime and high-demand strategies.
Migrating Drupal sites without downtime using an ingenious combination of DNS switchover and proxying.
The idea is to allow only that much traffic through to your system which your system can handle successfully.
Everyone knows that problems always seem to happen when you are asleep, on holiday or away from your computer.
That awesome load-balanced, redundant, no-single-point-of-failure stack you’ve built? Yeah, doesn’t do you much good when the lights go out. In my experience, the worst, most sustained downtime has always been caused by power issues.
Why Github uses ldirectord for high-demand load-balancing.
Preparing your infrastructure for unexpected traffic spikes and ‘Slashdotting’, with caching, proxying and denormalisation of data.
Tools and techniques for monitoring your systems.
Flapjack is a scalable and distributed monitoring system. It natively talks the Nagios plugin format, and can easily be scaled from 1 server to 1000. Flapjack aims to be simple to set up, configure, and maintain, and easily scales from a single host to multiple.
We’ve created a website monitor, in plain English, which genuinely checks the behaviour of a website, and which returns data in the form of a standard Nagios plugin. Suddenly the barrier to doing truly intelligent website monitoring has been reduced, very significantly.
This is a detailed howto article, but you can skip the parts that you don’t need easily, and it will get you up and running with an enviable Nagios Drupal Monitoring station.
Monitoring “is it down?” is reactionary. It is better than no monitoring at all, but all it tells you is that there is already a problem. Monitoring is better when it predicts the future and prevents problems.
Non-functional requirements are essentially the often unspoken desires of a product owner or user about how a piece of software works, rather than what it does. We don’t really mean how the software itself behaves, as this can still be captured in behaviour-driven design and acceptance tests; rather we mean aspects of its behaviour over the lifecycle of the application.
When monitoring a Web site, you need to look at it both from a ‘micro’ perspective (i.e. are the individual devices and servers in your infrastructure running smoothly?) and from a ‘macro’ perspective (i.e. do your customers have a pleasant experience when accessing and using your site?; can they use your site’s functionality to conduct their business online?).
Performance tuning and high-volume techniques.
Masterzen explains how to offload the job of Puppet static file serving onto Nginx, a small and performant web server, and also how to have Nginx cache the files. He also gives a recipe for configuring Nginx to cache the compiled catalogs for each machine, reducing some of the compute load on the Puppet server.
Autobench is a simple Perl script for automating the process of benchmarking a web server (or for conducting a comparative test of two different web servers).
There are many ways to scale modern web applications. What I will be describing here is the method that we chose. This should by no means be considered the only way to scale an application. Consider it a case study of what worked for us given our unique requirements.
EXT3, EXT4, XFS, ReiserFS, and Btrfs benchmarked.
How to use vmstat, iostat, and top to understand what part of your system is the bottleneck.
Insight into the process of building scalable internet based services.
iperf is a great little tool for quickly measuring the network bandwidth between two machines.
Provisioning and building new systems.
When looking at bootstrapping an infrastructure, people opt either for taking, or creating full images of an existing environment , or for “scripting” automated installations. Both have their advantages and disadvantages.
The core definition of “fully automated provisioning”: the ability to deploy, update, and repair your application infrastructure using only pre-deﬁned automated procedures.
A checklist for security and performance tweaks you’ll want to apply when building a new Ubuntu box to host Drupal sites. Most of these will apply to CentOS as well, but the package names may be slightly different.
Virtual machines and VM hosting.
While a lot of effort is spent on automating the installation of the machine OS and its application, I see that the provisioning of a virtual machine is often still done by the GUI. So why not automate that step too.
Linux tuning information is scattered among many hundreds of sites, each with a little bit of knowledge. Virtual machine tuning information is equally scattered about. This is my attempt at indexing all of it.
While new technologies and delivery models have made it much simpler to manage the infrastructure, this is not where our core inefficiencies lie. Virtualization principles must be extended to higher levels of the application stack, to make it easier for all of us to manage, tune and integrate applications.
How to become a better sysadmin.
Version control and other development best practices which sysadmins should use.
For years, IT pros have heard that they must do more with less, as staffing is cut and outsourced, while demands to better serve the business and adopt new technologies continually increase. This is how it’s always been, but it doesn’t have to be how it always will be.
Let’s face it, System Administrators get no respect 364 days a year. This is the day that all fellow System Administrators across the globe, will be showered with expensive sports cars and large piles of cash in appreciation of their diligent work.
Many of my former Go students headed to Berkeley and Stanford, chose computer science major, and became software engineer. At the minimal, Go will calm a person down and teaches him/her to concentrate for a long period of time.
Lock-picking will teach you to be humble and not focus on what others can do better, but to take an honest inventory of your skills and your progress.
Hardening your systems, and deterring and detecting intruders.
Here are a few things you need to tweak in order to improve OpenSSH server security.
Tools, software and applications for sysadmins.
DNS Knife is a good tool to check if your DNS setup is ok, it checks the parent servers, it checks for if your nameservers are listed on the parent server, checks if all your nameservers are reachable and are authoritative. And so on and so on…
A simple command-line tool to be able to grab real-time stats from memcache.
A large crowdsourced list of sysadmin diagnostic tools available on Linux.
Automated configuration management tools and strategies.
When doing software testing, your testing tool is normally separate from the language and libraries you’re building the software with (but almost always written in the same language). When testing your infrastructure, I think it makes perfect sense to apply this practice.
Cloud computing says that building servers is undifferentiated heavy lifting; unless your service is building servers, you should really let someone else do it and focus on the product or service you’re actually trying to sell. Configuration Management is the first step in bringing this same ideology to configuring systems.
Hosts in a well-architected enterprise infrastructure are self-administered; they perform their own maintenance and upgrades. By definition, self-administered hosts execute self-modifying code. They do not behave according to simple state machine rules, but can incorporate complex feedback loops and evolutionary recursion.
Resources to help you learn about Puppet or master more advanced technical issues.
think it makes sense to use a tool like Puppet for the initial configuration of the OS and of the packages required by your application. When it comes time to deploy your application, I think a tool like Fabric is more appropriate.
So far, my biggest problem with Nagios has been finding the time to add new systems to it, figuring out what services to check, etc. It’s not a particularly difficult thing to do, but in the grand scheme of things, it was just something that always fell by the wayside in the drive to get more systems set up, deal with user problems, and put out the inevitable fires. That is, until recently.
People complain that Puppet is non-deterministic. On a certain level that is like complaining that threads are non-deterministic. That’s the way the model works by design. If there is logic that depends on the order of execution, that code needs mutex/syncronization. Threads create issues, but they also solve some.
Great article on configuring the Puppet server to use Passenger and Apache, with a complete example vhost definition for puppetmasterd.
Setting up a staging environment for your modules, manifests and files on the puppetmaster. Test your modules, manifests, templates files and facts before deploying on production servers.
Other tools and frameworks for configuration management.
Kokki is a configuration management framework inspired by Chef.
Moonshine is an opensource configuration management and deployment system that follows the Rails way, simplifying server configuration, dependency management, and Rails application deployment, using Ruby and Puppet.
slaughter is a simple tool which will allow policies to be downloaded, via HTTP, from a central server and executed upon a local machine. The intention is that these local policies may be written in a portable fashion and used to automate the administration of a large number of Linux machines.
Sprinkle is based on capistrano and uses the same push model without any additional infrastructure.
Resources, tips and links for Mac users and those who support them.
Recommended Mac applications.
Movist is a fast and user-friendly media player for OS X, based on FFMPEG and Quicktime.
SynergyKM is a GUI wrapper around the synergy command line tool that lets you easily share a single mouse and keyboard between multiple computers with different operating systems without special hardware.
For more than ten years I’ve operated a string of one-man businesses. My model is: keep it lean, hire no one, and outsource very little.
If you own a small business, you have probably entered into a contract with a consultant. Most people don’t realize that every consulting agreement should contain a minimum of nine elements. If those elements aren’t included, you will probably not have a successful result.
Unconsciously, everyone expects a startup to be like a job, and that explains most of the surprises. It explains why people are surprised how carefully you have to choose cofounders and how hard you have to work to maintain your relationship. You don’t have to do that with coworkers. It explains why the ups and downs are surprisingly extreme. In a job there is much more damping. But it also explains why the good times are surprisingly good: most people can’t imagine such freedom. As you go down the list, almost all the surprises are surprising in how much a startup differs from a job.
Tips, tutorials and howtos relating to Drupal.
The various Drupal database tables and what they do.
If you are running more than one Drupal site, you can simplify management and upgrading of your sites by using the multi-site feature. Multi-site allows you to share a single Drupal installation (including core code, contributed modules, and themes) among several sites.
This is a non-technical introduction. It’s more targeted to people who have no idea on what Drupal is and what makes it so great.
We’ve all heard Drupal can run every site from your personal blog to massive social networks. The framework is flexible and powerful enough to do anything. The showcase list of Drupal sites is impressive and growing.
Modules, code and hacks for Drupal.
Drucumber is a new module that converts native-language like text into a Drupal Simpletests. The goal of this module is to allow end-users and project teams to implement behavior driven development: specify behaviors or features of a Drupal site in a test and never again worry about checking it.
n the two years since I’ve used Mollom, the service probably has blocked more than 100,000 pieces of spam from being posted at my site.
Basically it adds a hidden form element which your users won’t see, but automated spambots will. If anything is placed in the textfield, validation fails. So instead of making the user prove that they are human, we allow the spambot to betray itself as a bot.
Performance tuning, caching and scalability topics for Drupal.
Taking advantage of Nginx’s direct Memcache integration by delivering Drupal pages from cache - even for authenticated users.
There are many ways to improve the performance & scalability of Drupal. Comparison of a selection of performance and scalability modules.
Lifetime Digital case study highlights the proven scalability and performance of Drupal websites in a high traffic environment. By utilizing Akamai technology for caching of rich media and for anonymous users, Lifetime Digital’s 6 Drupal sites have proven successful in supporting 50 million page views per month.
As anyone who has developed a Drupal site with devel module’s query logging on can tell you, a Drupal implementation can quickly get out of control when it comes to hammering the database.
Search engine optimisation and web marketing techniques.
Sales-based sites are where SEO really comes into its own in terms of return on investment, and it literally is the case that even the smallest tweaks can result in real increases in revenue. So here are seven ways to help transactional e-commerce sites boost their search rankings.
Follow @bitfield on Twitter here: http://twitter.com/bitfield