Abstract
In many classical IT environments, many different instances of a given system exist at the same time : production, several UAT and/or SIT systems, development system, etc…
In order to keep those systems synchronized, some procedures are often set up (delivery notes, hand-over forms, etc…) – that merely describe how to deploy adaptations to move from release X to Y.
To my opinion, those procedures bring too much rigidity, and lead to long-term divergence of the systems. You certainly faced one of the following situations :
- After a production problem, a quick fix is applied directly in production (and never mirrored to any test system)
- Development team has delivered a nice binary, but did not think about housekeeping crontabs… Production team writes them in their own scripts, but the change is not reflected in production
- Project manager and half of the management board is on the shoulders of the IT team… Developers push the binaries in a hurry, production team integrate it in the night, and, yes – we met the deadline – but nobody knows what has been done to get it running…
The goal of this blog post is to present an approach that does not break the procedures in place, but brings flexibility and allows auditing all changes (who made which change, when), even in extreme cases described above –
Git will be used as the main tool to guarantee systems are synchronized. Git has been designed for other purposes (distributed source control system) – but as you will see, it is the perfect tool to keep our systems synchronized.
Goals
To simplify the presentation, I will assume only two systems : a test system and a production system. But the same tools can be used to synchronize more systems.
Our integration of git must guarantee that
- Environments are not connected together : all changes must be packaged and each package will be handed over from e.g. test environment to prod environment
- Changes usually flow from development to production, but it must be possible to make a change in production environment, and integrate it in test. (symmetric approach !)
- Rollback must be straightforward
- All changes on any system can be viewed easily – we want to know, with a single command
- who applied a package
- when was the package applied
- summary of the changes brought
- We must be able to know at any time if the system has been modified since last release, and which modifications were brought
- Some files must be tracked, but some files must be ignored by the system (because we know they will differ between systems)
- log files
- some configuration files (e.g. JDBC connection strings…)
- data files of the local RDBMS (we are just synchronizing filesystems)
Setting up system
Configuring .gitignore file
.gitignore will determine which files/directories will be tracked. Ignored files are either system files, log files, some configuration files, etc…
For example, imagine an application (“seriousapp”) running on a unix system
├── dev
├── etc
│ └── seriousapp
├── home
│ ├── frank
│ ├── harry
│ ├── jenny
│ └── seriousapp
│ ├── appserver
│ │ ├── app1
│ │ │ └── log
│ │ ├── app2
│ │ │ └── log
│ │ └── app3
│ │ └── log
│ ├── logs
│ └── tools
├── lib
└── usr
Application is mostly deployed on /home/seriousapp but also has a configuration directory in /etc/seriousapp. We want to keep track of changes in seriousapp, but avoid that changes that ‘jenny’ does on her home directory (/home/jenny) is detected by the system. Some directories (*/log(s), usr, lib, …) shall also not be tracked.
The approach to build .gitignore is the following
- exclude everything via .gitignore (–> wildcard *)
- explicitly include application directories (–> all application directories are tracked)
- in application directories, add an extra .gitignore, that will either include or exclude specific files or directories
In our previous example, we would have the following :
.gitignore contents :
*
!home/seriousapp/
!home/seriousapp/*
!etc/seriousapp/
!etc/seriousapp/*
!.gitignore
!*/
home/seriousapp/.gitignore, /etc/seriousapp/.gitignore contents :
(nb – without this line, subdirectories and files will be ignored)
!*
home/seriousapp/logs/, and all other log directories also contain a .gitignore file :
*
!.gitignore
OK – that was the hardest part – from now on, we will use the power of git to assist us !
Initialize the system
first cd to the root directory, then issue the commands
git init
git add .
git status #just check that all sensible files are listed here !
git commit -m "Release 1.0 (JIRA1234,JIRA3322,JIRA3224)"
Those commands will initiate a .git directory, holding a full copy of your files, and meta information. You can feed the commit command with any string, but it is a good practice to put there the official release number, as well as a reference to all changes.
Execute these commands only on a single machine. Once this is done, you can copy over the “.git” directory to all other machines you want to synchronize with.
After having copied “.git” directory, you can run a “git status” to retrieve all differences between the original and the target system. For the sake of simplicity, I will assume there is no difference yet. In the real life, it is time to synchronize manually the files…
You can also copy the “.git” directory to an empty place, and issue the following command to create a fresh-new copy of the original system :
git checkout HEAD -- .
In the next section, I will show you how to propagate differences.
Keeping systems synchronized
OK – Now you have two (or more) system initialized with the same contents. Let us imagine that some modification has to be brought on the original system…
Thanks to git, you don’t have to take any measure (like creating a backup file) before you bring modification. Just edit/modify the files, and you’re done. For example, imagine that you change a timeout in etc/seriousapp/application.properties :
echo "CONNECTION_TIMEOUT=10000 >> etc/seriousapp/application.properties
After you’ve made all your changes, you can check all what you’ve done with “git status”.
$ git status
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
etc/seriousapp/application.properties
nothing added to commit but untracked files present (use "git add" to track)
It is a good practice to add files as soon as you’ve finished with them :
git add etc/seriousapp/application.properties
Once all your modifications are done (and added), you can commit
git commit -m "Release 1.1 (JIRA 3325)"
It is now time to follow your company testing procedure, and validate that your environment is working fine. Once you are ready to deploy your changes to the target environment, first check that everything is commited :
$ git status
On branch master
nothing to commit, working directory clean
Then you will create a bundle. Bundles are special git files that will contain all differences, and allowing to move from one version to another.
$ git bundle create /path/to/file.bundle master
After this step, you can move the “file.bundle” to the target system, and import it using the following command (check that the system is clean) :
$ git status
On branch master
nothing to commit, working directory clean
$ git pull --rebase /path/to/file.bundle master
the “–rebase” option is useful, when both source and target systems are modified and have started diverging. Rebase will first rewind all changes to a version known by both source and target systems, apply the bundle, then re-apply all changes. After this operation, you get a system with the desired changes and the modifications of the target system. if “–rebase” is not used, git would merge the changes, and this causes a lot of issues mirroring the modifications on the source system.
What is nice with this approach is that it is completely symmetric : bundle file can come from either test or production system. File creation and edition remain free, but all modifications are silently tracked by the git system, with a full control on rollback (using git checkout) and difference checking (using git diff).
Better control using tags
Although the system is already useable, there is an important missing feature. In production environments, we want to track who and when a given modification has been applied.
The tag mechanism of git can help us here :
After having installed a version on a production system, the operator will have to tag the version with a reference to the intervention :
git tag -a "v1.1" -m "Release plan R45"
You can freely choose tag names and messages, but here again, refer to your local procedures (for example, make a reference to a release plan)
Later on, you can query the system for all the tags, and all install dates with the commands
$ git tag
v1.1
v1.2
$ git for-each-ref --sort=taggerdate --format '%(taggerdate) %(refname)
%(taggername) %(subject)' refs/tags
Sun Jun 5 22:58:02 2016 +0200 refs/tags/v1.1 frank Release plan R45
Sun Jun 5 23:02:15 2016 +0200 refs/tags/v1.2 jenny Release plan R46
Summary
Using git on various instances of a system allows to act on system divergence. The use of “bundle” files keeps environments isolated from each other, and tagging (on the production system) bring a smart solution to track the author and the installation dates.
Git was not designed for this purpose, but its versatility and flexibility has brought us a surprisingly well-suited tool to track system divergences without changing how our company manages releases.
Git offer many side tools that are not described in this blog. For example “git diff” allows to quickly check what has changed in the last bundle. The use of branches might also be beneficial when it comes to experiment, or apply temporary patches. I have not discovered yet all benefits of git, but for sure, its adoption in production environments makes lives easier.