linux-ansible-guide

Posted on 2022-09-30 Edited on 2023-08-16

Overview

Ansible is an open source IT automation engine that automates provisioning, configuration management, application deployment, orchestration, and many other IT processes.

basic arch

Ansible works by connecting to your nodes and pushing out small programs—called modules—to these nodes. Modules are used to accomplish automation tasks in Ansible. These programs are written to be resource models of the desired state of the system. Ansible then executes these modules and removes them when finished.

terminology:

Control node: the host on which you use Ansible to execute tasks on the managed nodes
Managed node: a host that is configured by the control node
Host inventory: a list of managed nodes
Ad-hoc command: a simple one-off task
Playbook: a set of repeatable tasks for more complex configurations
Module: code that performs a particular common task such as adding a user, installing a package, etc.

Concepts

$ansible --version
ansible 2.9.27
  config file = /etc/ansible/ansible.cfg ------> the cfg used
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jun 28 2022, 15:30:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

# get available modules
$ansible-doc -l

# list all hosts
$ansible all --list-hosts
# list hosts in web group
$ansible web --list-hosts

# get help and example for each module
$ansible-doc copy
$ansible-doc shell

# show inventory graph(higher ansible)
$ansible-inventory -i host --graph
@all:
  |--@ungrouped:
  |  |--172.17.0.3

# show in json format
$ansible-inventory -i host --list
{
    "_meta": {
        "hostvars": {}
    }, 
    "all": {
        "children": [
            "ungrouped"
        ]
    }, 
    "ungrouped": {
        "hosts": [
            "172.17.0.3"
        ]
    }
}

conf

default configuration file is /etc/ansible/ansible.cfg

The order in which a configuration file is located is as follow.

ANSIBLE_CONFIG (environment variable)
ansible.cfg (per directory)
~/.ansible.cfg (home directory)
/etc/ansible/ansible.cfg (global)

All parameters of conf and conf example

# dump the current setting
$ansible-config dump

# Only show configurations that have changed from the default
$ansible-config dump --only-changed

$cat ansible.cfg
[defaults]
# thirdpart to speed up ansible, see ref below
strategy_plugins = /xxx/mitogen-0.3.3/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

# the actual number may be less than this(max value) due to cpu and memory used for each fork
# suggest: 30-50 for server and less than total cpu number
forks          = 50
# do not gather by default, must say gather_facts: True
gathering      = explicit
# SSH timeout
timeout = 60
log_path = ./log/ansible.log

[ssh_connection]
# It can result in a very significant performance improvement when enabled. However this conflicts with privilege escalation (become). For example, when using ‘sudo:’ operations you must first disable ‘requiretty’ in /etc/sudoers on all managed hosts, which is why it is disabled by default
pipelining = True

inventory

The inventory file contains the IP address or DNS information about the list of managed hosts we want to work with.

Inventory file has a concept called grouping where you will be grouping your resources and run tasks against that group. You can create the inventory file without using groups. In this case, Ansible will use two default groups "all" and "ungrouped".

ALL GROUP - All resources that are available in the inventory file by default will be assigned to all group.
UNGROUPED - Resources that are not part of any user-defined groups will be automatically assigned to the ungrouped group

inventory path
default: /etc/ansible/hosts can be changed in the ansible.cfg Or by using the -i option on the ansible command

$cat ansible.cfg
[defaults]
inventory = $HOME/hosts

$ansible -i xxx/hosts

inventory file example

# must /etc/hosts for vm1, vm2, vm3 and vm4
web1
[web]
vm2
vm3

[db]
192.168.1.2

[log]
192.168.1.100
# use inventory from default
$ansible web -m ping

# use explicit inventory from a host
$ansible web -m ping -i ./hosts

# use explicit inventory from command line
$ansible all  -i 172.17.0.2,172.17.0.3 -m ping
# NOTE the , is needed for one host
$ansible all  -i 172.17.0.2, -m ping

modules

Modules (also referred to as “task plugins” or “library plugins”) are discrete units of code that can be used from the command line or in a playbook task. Ansible executes each module, usually on the remote managed node, and collects return values.

Each module supports taking arguments. Nearly all modules take key=value arguments, space delimited. Some modules take no arguments, and the command/shell modules simply take the string of the command you want to run.

Modules should be idempotent, and should avoid making any changes if they detect that the current state matches the desired final state. When used in an Ansible playbook, modules can trigger ‘change events’ in the form of notifying handlers to run additional tasks

# module from command line
$ansible webservers -m service -a "name=httpd state=started"

# fork 100 process to run
$ansible webservers -m ping -f 100
$ansible webservers -m command -a "/sbin/reboot -t now"

# module from playbook
$cat play.yml
- name: restart webserver
  service:  # module
    name: httpd  # parameter
    state: restarted # parameter

Top modules ares file, include, template, command, service, shell, lineinfile, copy, yum, user, systemd, cron etc.

file: Creating different new files is a common task in the server scripts. In Ansible tools, you will find various methods for creating a new file. You can even set different group permission, assign an owner to the file; create a file with content, and more. It sets attributes of directories, symlinks, and files. Besides, it removes symlinks, directories, and file.
- ansible test-servers -m file -a 'path=/tmp/test state=directory mode=0755'
ping is used when we want to check whether the connection with our hosts defined in the inventory file is established or not.
- ansible test-servers -m ping
copy: The copy module is often used in writing playbooks when we want to copy a file(support directory as well) from a remote server to destination nodes.
- ansible test-servers -m copy -a 'src=/home/knoldus/Personal/blogs/blog3.txt dest=/tmp'
- ansible test-servers -m copy -a 'src=/home/knoldus/Personal/blogs dest=/tmp'
fetch: Ansible’s fetch module transfers files(not support directory) from a remote host to the local host. This is the reverse of the copy module.
- ansible test-servers -m fetch -a 'src=/var/log/nginx/access.log dest=fetched'
synchronize: Ansible’s fetch module to push/pull directory
- ansible all -m synchronize -a 'mode=pull src=/export/Data/xcgroup/persistence dest=fetched' pull src to local fetched/
- ansible all -m synchronize -a 'mode=push src=/export/Data/xcgroup/persistence dest=fetched' push src to remote fetched/
yum: We use the Yum module to install a service.
- ansible test-servers -m yum -a 'name=httpd,postfix state=present'
shell: When we want to run UNIX commands then we use shell module
- ansible test-servers -m shell -a 'ls -la'
script: When we want to run a bunch of commands use script module
- ansible test-servers -m script -a './test.sh'
service: When we want to ensure the state of a service that is service is running we use the service module
- ansible test-servers -m service -a 'name=httpd state=started'

tempalte: The Template module is used to copy a configuration file from the local system to the host server. It is the same as the copy module, but it dynamically binds group variables defined by us.

- name: a play
  hosts: all
  gather_facts: no
  vars:
    variable_to_be_replaced: 'Hello world'
    inline_variable: 'hello again.'
  tasks:
  - name: Ansible Template Example
    template:
      src: hello_world.j2 # in template, we can use var defined here
      dest: /Users/mdtutorials2/Documents/Ansible/hello_world.txt

lineinfile: this is generally used to alter or remove the existing line, insert line, and to replace the lines. Let’s know about the process to insert a line. You can set the file’s path to modify using the path/ dest parameter. You can insert lines through the line parameter. The line enters to the EOF. However, if the line is already there in the system, it won’t be added.
- ansible test-servers -m lineinfile -a 'path=/etc/selinux/config regexp=^SELINUX= line=SELINUX=enforcing'

replace: The replace module replaces all instances of a defined string within a file.

- hosts: 127.0.0.1
  tasks:
  - name: Ansible replace string example
    replace:
      path: /etc/ansible/sample.txt
      regexp: 'Unix'
      replace: "Linux"

- hosts: 127.0.0.1
  tasks:
  - name: Ansible replace string example
    replace:
      path: /etc/hosts
      regexp:  '(\s+)server\.myubuntu\.com(\s+.*)?$'
      replace: '\1server.linuxtechi.info\2' # use captured tokens

include: When we want to include another playbook in our playbook, then we use the Include module
user: To add a particular user to our module we can use User module

Ad-Hoc

You can also use Ansible to run ad-hoc commands. To do this, you will need to run a command or call a module directly from the command line. No playbook is used. This is fine for a one time task.

host pattern

# command format
# without -m, default module is 'command' similar like `shell`
$ansible [host-pattern] -m [module] -a “[module options]”


$ansible all -m copy -a 'src=dvd.repo dest=/etc/yum.repos.d owner=root group=root mode=0644'

# Each node reports SUCCESS and "changed" : `true meaning the module execution was successful 
# and the file was created/changed`. If we run the command again, 
# the output will include "changed" : false meaning the file is already present 
# and configured as required. In other words, 
# Ansible will only make the required changes if they do not already exist. 
# This is what is known as "idempotence".

$ansible all -m ansible.builtin.service -a "name=libvirtd state=started"

PlayBook(suggested way)

A playbook runs in order from top to bottom. Within each play, tasks also run in order from top to bottom. Playbooks with multiple ‘plays’ can orchestrate multi-machine deployments, running one play on your webservers, then another play on your database servers, then a third play on your network infrastructure, and so on.

Plays consist of an ordered set of tasks to execute against host selections from your Ansible inventory file. Tasks are the pieces that make up a play and call Ansible modules. In a play, tasks are executed in the order in which they are written.

Ansible includes a “check mode” which allows you to validate playbooks and ad-hoc commands before making any state changes on a system. This shows you what Ansible would do, without actually making any changes. Handlers in Ansible are used to run a specific task only after a change has been made to the system. They are triggered by tasks and run once, at the end of all of the other plays in the playbook

playbook handlers

By default, handlers run after all the tasks in a particular play have been completed. Notified handlers are executed automatically after each of the following sections, in the following order: pre_tasks, roles/tasks and post_tasks. This approach is efficient, because the handler only runs once, regardless of how many tasks notify it. For example, if multiple tasks update a configuration file and notify a handler to restart Apache, Ansible only bounces Apache once to avoid unnecessary restarts.

# basic playbook
$cat play.yml
- name: Variables playbook                                                      
  hosts: all                                                                    

  tasks:                                                                        
  - name: Install the installed of package "postfix"                          
    yum:                                                                        
      name: "postfix"                                                           
      state: installed

# with variable
$cat play.yml
- name: Variables playbook                                                      
  hosts: all                                                                    
  vars:                                                                         
      state: installed                                                          
      user: bob                                                                 
  tasks:                                                                        
  - name: Add the user {{ user }}                                               
    ansible.builtin.user:                                                       
      name: "{{ user }}"                                                        
  - name: Install the {{ state }} of package "postfix"                          
    yum:                                                                        
      name: "postfix"                                                           
      state: "{{ state }}"    

# with handler
$cat play.yml
- name: Verify apache installation
  hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
    - name: Ensure apache is at the latest version
      ansible.builtin.yum:
        name: httpd
        state: latest

    - name: Write the apache config file
      ansible.builtin.template:
        src: /srv/httpd.j2
        dest: /etc/httpd.conf
      notify:
      - Restart apache # handler

    - name: Ensure apache is running
      ansible.builtin.service:
        name: httpd
        state: started

  handlers:
    - name: Restart apache # defined handler
      ansible.builtin.service:
        name: httpd
        state: restarted

# -v verbose -vv, -vvv, -vvvv
$ansible-playbook -i host play.yml  -v

################### retry for the failed nodes########
# play.retry is auto generated of failed nodes
$ansible-playbook -i host --limit @play.retry play.yml

conditional task
Run a task only when condition is matched, condition can be custom variable, ansible built-in variable, or result of another task.

- hosts: webservers
  remote_user: root
  tasks:
  - name: Host 192.168.1.101 run this task
    debug: 'msg=" {{ ansible_default_ipv4.address }}"'
    when: ansible_default_ipv4.address == "192.168.2.101"

  - name: memtotal < 500M and processor_cores == 2 run this task
    debug: 'msg="{{ ansible_fqdn }}"'
    when: ansible_memtotal_mb < 500 and ansible_processor_cores == 2

  - name: all host run this task
    shell: hostname
    register: info
  - name: Hostname is lamp1 Machie run this task
    debug: 'msg="{{ ansible_fqdn }}"'
    when: info['stdout'] == "lamp1"
  - name: Hostname is startswith l run this task
    debug: 'msg="{{ ansible_fqdn }}"'
    when: info['stdout'].startswith('l')

For more keyword that’s available in playbook, refer to playbook keywords

Error Handling In Playbooks

Ignoring Failed Commands

Generally playbooks will stop executing any more steps on a host that has a task fail. Sometimes, though, you want to continue on. To do so, write a task that looks like this. This feature only works when the task must be able to run and return a value of ‘failed’, ignore_errors still print error output but continue to run next one. but as even fails, it continues to run next task if next task depends on result of the failed task, the result may not be the object descripted in module doc. as the command may be not run on the node at all. but if no ignore error, the result used in next task is always the one expected as otherwise it does not run if the previous one fails.

- name: this will not be counted as a failure
  command: /bin/false
  register: result
  ignore_errors: yes

# output like this
  fatal: [10.229.225.6]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"}
...ignoring

Handlers and Failure

When a task fails on a host, handlers which were previously notified will not be run on that host. This can lead to cases where an unrelated failure can leave a host in an unexpected state. For example, a task could update a configuration file and notify a handler to restart some service. If a task later on in the same play fails, the service will not be restarted despite the configuration change.

You can change this behavior with the --force-handlers command-line option, or by including force_handlers: True in a play, or force_handlers = True in ansible.cfg. When handlers are forced, they will run when notified even if a task fails on that host.

Controlling What Defines Failure

Ansible lets you define what “failure” means in each task using the failed_when conditional. As with all conditionals in Ansible, lists of multiple failed_when conditions are joined with an implicit and, meaning the task only fails when all conditions are met. If you want to trigger a failure when any of the conditions is met, you must define the conditions in a string with an explicit or operator.

- name: Fail task when the command error output prints FAILED
  command: /usr/bin/example-command -x -y -z
  register: command_result
  no_log: true       # never print result of this task.
  failed_when: false # never fail this task, but use command_result in another task

- name: Fail task when the command error output prints FAILED
  command: /usr/bin/example-command -x -y -z
  register: command_result
  failed_when: "'FAILED' in command_result.stderr"

- name: Fail task when both files are identical
  raw: diff foo/file1 bar/file2
  register: diff_cmd
  failed_when: diff_cmd.rc == 0 or diff_cmd.rc >= 2

- name: Check if a file exists in temp and fail task if it does
  command: ls /tmp/this_should_not_be_here
  register: result
  failed_when:
    - result.rc == 0
    - '"No such" not in result.stdout'

Overriding The Changed Result

When a shell/command or other module runs it will typically report “changed” status based on whether it thinks it affected machine state.

Sometimes you will know, based on the return code or output that it did not make any changes, and wish to override the “changed” result such that it does not appear in report output or does not cause handlers to fire:

tasks:

  - shell: /usr/bin/billybass --mode="take me to the river"
    register: bass_result
    changed_when: "bass_result.rc != 2"

  - command: /bin/fake_command
    register: result
    ignore_errors: True
    changed_when:
      - '"ERROR" in result.stderr'
      - result.rc == 2

debug module
This module prints statements during execution and can be useful for debugging variables or expressions without necessarily halting the playbook.

Parameters

msg： The customized message that is printed. If omitted, prints a generic message.
var: A variable name to debug. Mutually exclusive with the msg option. Be aware that this option already runs in Jinja2 context and has an implicit {{ }} wrapping, so you should not be using Jinja2 delimiters unless you are looking for double interpolation.
verbosity: A number that controls when the debug is run, if you set to 3 it will only run debug when -vvv or above. Default: 0

- name: Get uptime information
  ansible.builtin.shell: /usr/bin/uptime
  register: result

- name: Print return information from the previous task
  ansible.builtin.debug:
    var: result
    verbosity: 2

#########################################

- name: Task name
  stat:
    path: /etc/lt.conf
  register: register_name

- name: Task name
  debug:
    msg: "The file or directory exists"
  when: register_name.stat.exists

Asynchronous Actions and Polling(task level)

Ansible runs tasks synchronously by default, if one task fails, the others does not run anymore. It keeps the connection to the remote node open until the task is completed. This means within a playbook, each task blocks the subsequent tasks until the current task completes.

Some of the long-running tasks could be

Downloading a Big File from URL
Running a Script known to run for a long time
Rebooting the remote server and waiting for it to comeback

This may cause issue. Suppose you have a task in your playbook which takes more than say 10 minutes to execute. This means that the ssh connection between Ansible controller and the target machine should be stable for more than 10 minutes. It may take longer to complete than the SSH session allows for, causing a timeout. One can run the long-running process to execute in the background to perform other tasks concurrently.

To avoid blocking or timeout issues, you can use asynchronous mode to run all of your tasks at once and then poll until they are done.

To enable Asynchronous mode within Ansible playbook we need to use few parameters such as async, poll.

async - async keyword’s value indicates the total time allowed to complete the task.Once that time is over the task will be marked as completed irrespective of the end result. Along with this async also sends the task in the background which can be verified later on its final execution status.
poll - poll keyword allows us to track the status of the job which was invoked by async and running in the background. Its value decides how frequent it would check if the background task is completed or not.
- The Poll keyword is auto-enabled whenever you use async and it has a default value as 10 seconds.
- When you use poll parameter’s value set to positive Ansible will avoid connection timeouts but will still block the next task in your playbook, waiting until the async task either completes, fails or times out.

NOTE

async without poll, default poll is 10s, set timeout for the task
async with postive poll, same as above, it just sets timeout for the task
async with poll == 0, really async, the task marked finished immediately without waiting for it result!!, but if second task depends on first one(in a node), NOT use async!!!

##############--- set time out for a task ###########################
- name: async and poll example playbook
  hosts: workers
  become: true
  remote_user: ansible_user
  tasks:
    - name: update the system packages
      command: yum update -y
      async: 180 # the total time allowed to complete the package update task
      poll: 10 # Polling Interval in Seconds
      register: package_update

    - name: task-2 to create a test user # will be blocked until first task finished or timedout!!!
      user: name=async_test state=present shell=/bin/bash

##############--- set time out and async job for a task ###########################
- name: async and poll example playbook
  hosts: workers
  become: true
  remote_user: ansible_user
  tasks:
    - name: sleep for 60 seconds
      command: /bin/sleep 60
      async: 80 # the total time allowed to complete the sleep task
      poll: 0 # No need to poll just fire and forget the sleep command
      register: sleeping_node

    - name: task-2 to create a test user # will run even the first task is in progress
      user: name=async_test-2 state=present shell=/bin/bash      

- name: async and poll example playbook
  hosts: workers
  become: true
  remote_user: ansible_user
  tasks:
    - name: sleep for 20 seconds
      command: /bin/sleep 20
      async: 30 # the total time allowed to complete the sleep task
      poll: 0 # No need to poll just fire and forget the sleep command
      register: sleeping_node

    - name: task-2 to create a test user
      user: name=async_test-2 state=present shell=/bin/bash

    # check the async job status
    - name: Checking the Job Status running in background
      async_status:
        jid: "{{ sleeping_node.ansible_job_id }}"
      register: job_result
      until: job_result.finished # Retry within limit until the job status changed to "finished": 1
      retries: 30 # Maximum number of retries to check job status

Strategy

When running Ansible playbooks, you might have noticed that the Ansible runs every task on each node one by one, it will not move to another task until a particular task is completed on each node, which will take a lot of time, in some cases. By default, the strategy is set to “linear”, we can set it to free.

linear: run the first task of play on all nodes(forks), when the first task finished on all nodes, run the second tasks on all nodes
free: The nodes who finished the first task, can run the second task without waiting for host who is still running first task. a host that is slow or stuck on a specific task won’t hold up the rest of the hosts and tasks

# play level
$cat play.yml
- name: free strategy demo
  hosts: workers
  strategy: free

# global setting
$cat /etc/ansible/ansible.cfg
[defaults]
strategy = free

NOTE

if nodes has dependency, use linear, otherwise use free
free only speed up process for play with more than on tasks.

Python API(not frequently used)

python api

FQA

Ansible playbook hangs during execution

Refer to why ansible hangs

SSH timeout
command hangs in remote node

speed up playbook

tunning ansible

NameError: name ‘temp_path’ is not defined

In such case, task does not run at all, this is probably there is no disk space on remote node as ansible need to copy module to remote host at /root/.ansible/tmp

change remote_tmp

remote_tmp is set by ansible.cfg

$cat /etc/ansible/ansible.cfg
...
remote_tmp     = ~/.ansible/tmp

# or change it from env
$ANSIBLE_REMOTE_TEMP=/cedar/.tmp ansible-playbook play.yml -i ./hosts

get verbose output

Something if gathering is false, it’s hard to see why it fails, in this case, turn it on and run your playbook with -vvv or -vvvv

$cat ansible.cfg
[defaults]
gathering      = True

# OR

$cat play.yml
- name: perf check
  hosts: all
  gather_facts: true
...

# rerun
$ansible-playbook -i 10.211.98.106, playbook/check_file.yml -vvv

set python interpreter at remote node

As ansible copies python module to remote and runs it at remote node, so that the remote python interpreter should be compatible with the module copied from control node.

# check python on control node where you run ansible to see what python script that is for module
$ansible --version
ansible [core 2.13.4]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/data/Anaconda3/envs/py3.9/lib/python3.9/site-packages/ansible ----->ansible modules
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/data/Anaconda3/envs/py3.9/bin/ansible
  python version = 3.9.0 (default, Nov 15 2020, 14:28:56) [GCC 7.3.0] ----->ansbile module version
  jinja version = 3.1.2
  libyaml = True

# by default ansible select the remote python interpreter automatically based on its rule
# hence it may select a python which is not compatiable, here we can set remote python interpreter explicitly
$cat ansible.cfg
[defaults]
interpreter_python = /usr/bin/python3

# OR

$cat play.yml
- name: a play
  hosts: all
  gather_facts: no
  vars:
    ansible_python_interpreter: '/usr/bin/python3'

Cannot handle SSH host authenticity prompts for multiple hosts

$ ansible-playbook -i conf/app/south.host ./playbook/app/check_file.yml
The authenticity of host '10.0.0.4 (10.0.0.4)' can't be established.
ECDSA key fingerprint is SHA256:WkPeJUhNdz/MX3zAy536BHZRC/9INGEQWGhsmAPzkEo.
Are you sure you want to continue connecting (yes/no)? The authenticity of host '10.0.0.5 (10.0.0.5)' can't be established.
ECDSA key fingerprint is SHA256:UJCEM05W15HZuzOLRpxNli+Qnwei7j84u2lbpVFBqkI.
Are you sure you want to continue connecting (yes/no)?

Solution
edit ansible.cfg with host_key_checking = false

1 2	[defaults] host_key_checking = False

forks vs serial vs async

Serial sets a number, a percentage, or a list of numbers of hosts you want to manage at a time.
Async triggers Ansible to run the task in the background which can be checked (or) followed up later, and its value will be the maximum time that Ansible will wait for that particular Job (or) task to complete before it eventually times out or complete.
Ansible works by spinning off forks of itself and talking to many remote systems independently. The forks parameter controls how many hosts are configured by Ansible in parallel.

Suggestion

SERIAL : Decides the number of nodes process in each tasks in a single run.

Use: When you need to provide changes as batches/ rolling changes.
FORKS : Maximum number of simultaneous connections Ansible made on each Task.

Use: When you need to manage how many nodes should get affected simultaneously.

serial example

By default, with serial set, failing all servers(max fail percentage 100%) from one batch(serial value) will stop whole playbook to run even if there are some servers left in inventory, but this can be tunned with max_fail_percentage.

---
- name: test play
  hosts: webservers
  # serial: 10% or mix these two format
  # serial:
  #   - 3
  #   - 50%
  serial: 3
  gather_facts: False

  tasks:
    - name: first task
      command: hostname
    - name: second task
      command: hostname

In the above example, if we had 6 hosts in the group ‘webservers’, Ansible would execute the play completely (both tasks) on 3 of the hosts before moving on to the next 3 hosts:

PLAY [webservers] ****************************************

TASK [first task] ****************************************
changed: [web3]
changed: [web2]
changed: [web1]

TASK [second task] ***************************************
changed: [web1]
changed: [web2]
changed: [web3]

PLAY [webservers] ****************************************

TASK [first task] ****************************************
changed: [web4]
changed: [web5]
changed: [web6]

TASK [second task] ***************************************
changed: [web4]
changed: [web5]
changed: [web6]

PLAY RECAP ***********************************************
web1      : ok=2    changed=2    unreachable=0    failed=0
web2      : ok=2    changed=2    unreachable=0    failed=0
web3      : ok=2    changed=2    unreachable=0    failed=0
web4      : ok=2    changed=2    unreachable=0    failed=0
web5      : ok=2    changed=2    unreachable=0    failed=0
web6      : ok=2    changed=2    unreachable=0    failed=0

without serial

---
- name: test play
  hosts: webservers
  gather_facts: False
  tasks:
    - name: first task
      command: hostname
    - name: second task
      command: hostname

PLAY [webservers] ****************************************

TASK [first task] ****************************************
changed: [web3]
changed: [web2]
changed: [web1]
changed: [web4]
changed: [web5]
changed: [web6]

TASK [second task] ***************************************
changed: [web1]
changed: [web2]
changed: [web3]
changed: [web4]
changed: [web5]
changed: [web6]

PLAY RECAP ***********************************************
web1      : ok=2    changed=2    unreachable=0    failed=0
web2      : ok=2    changed=2    unreachable=0    failed=0
web3      : ok=2    changed=2    unreachable=0    failed=0
web4      : ok=2    changed=2    unreachable=0    failed=0
web5      : ok=2    changed=2    unreachable=0    failed=0
web6      : ok=2    changed=2    unreachable=0    failed=0

with large hosts, forks are decreasing after a while

With large hosts say 10000+, even set forks with larger number say 64, the ansible creates almost that number, but after a while if you check with ps -ef | grep ansible, the number becomes smaller and smaller with time to solve this

use serial(in playbook) to split hosts into small batches.
split hosts into several files outside

---
- name: test play
  hosts: webservers
  # serial: 10% or mix these two format
  # serial:
  #   - 3
  #   - 50%
  serial: 3
  gather_facts: False

  tasks:
    - name: first task
      command: hostname
    - name: second task
      command: hostname

$ ansible-playbook -i large_host ./playbook/with_serial_enabled.yml

# split large host into small ones
# 100 lines per small file
# -d use number as suffix
# host. used as prefix
$ split --lines=100 -d large_host host.
host.00 host.01

$ ansible-playbook -i host.00 ./playbook/without_serial.yml
$ ansible-playbook -i host.01 ./playbook/without_serial.yml

compare stdout with number

As stdout is a string, like "45", you have to convert it to int to compare it with nubmer

- name: check process fd
  hosts: all

  tasks:
    - name: get fd count of agent
      shell: lsof -p $(pidof agent) | wc -l
      register: ret
      # | to convert string to int
      failed_when: ret.stdout | int > 100