This is an old revision of the document!
Table of Contents
Ansible Playbook - FAH Installation
In our previous step we checked the health of CMOS batteries on our Stack of NUCs.
Now we are going to create and run an Ansible playbook to set up Folding at Home (FAH) on the nodes. I have updated the playbook by ajacocks to add the current release and hack up a quick fix.
Please note that the NUCs in am using this lab have only 4 cores, and for some WU's (work units) the client will only use 3 cores. So don't expected to be scoring many points with these small boxes.
Purpose:
- Demonstrate a running a complex workload of a service combined with configuration files
References
Step 1 - Install the fahcontrol app on NUC 1
The official download here does not work with Ubuntu 22.04. Use https://github.com/cdberkstresser/fah-control.
- Open a shell on NUC 1
- Install packages
sudo apt-get install -y python3-stdeb python3-gi python3-all python3-six debhelper dh-python gir1.2-gtk-3.0
- Clone the repo and run the command
cd fah-control
./FAHControl
Step 2 - Install the the FAH client using Ansible
From NUC 1, log in to the Ansible control node, NUC 2.
- Change directory to /home/ansible/my-project
git clone --branch support-7-6-21 https://github.com/doritoes/fah.git
- Change directory to
/home/ansible/my-project/fah
- Modify file
/home/ansible/my-project/fah/inventory
- copy your ansible node IPs from the file /home/ansible/my-project/hosts to the [clients] section
- chost='(IP of Control Node)'
- cpass='(control-node-password)'
- username='(Yourname @ folding@home)'
- team='(if you support a team)'
- passkey='(redacted passkey from folding@home)'
ansible-playbook main.yml
- if you encounter a DNS lookup failure on some or all nodes
- your wireless router should be setting DNS information as part of DHCP
- did you disable the DNS stub resolver in earlier steps?
- if you cannot connect with the control app and/or you see an error regarding a locked database
- reboot the node to clear the error
- it seems running the playbook on an already configured system and run multiple copies of FAH and cause the problem; rebooting solves the issue
- Reboot all the clients to ensure the service registers properly and no double processes are running
ansible clients -m reboot
- If you want to confirm your FAH configuration copied correctly, see the optional section below
- On NUC 1, open the FAH control program
- Add clients one at a time in FAHControl
- Any name you want
- IP address of the client
- Control password you used configuring FAH
Next Step
Congratulations! Your Stack of NUCs is now fully occupied running a valuable workload! Next up is Ansible Playbook - FAH Removal, where we disable FAH and remove it.
Optional
Check FAH Status
- Check the FAHClient status
ansible-playbook checkfahstatus.yml
- checkfahstatus.yml
--- - hosts: clients become: true become_user: root tasks: - name: Get FAH service Status ansible.builtin.systemd: state: "started" name: "FAHClient" register: fah_service_status - name: Show status debug: msg: "{{ fah_service_status.status.ActiveState }}"
- Check the config file
- check-fah-config.yml
cat check-fah-config.yml --- - hosts: clients become: true become_user: root tasks: - name: Read FAH client from config.xml shell: cat /etc/fahclient/config.xml changed_when: false register: configuration - name: Dump configuration debug: var: configuration.stdout_lines
- If the configuration did not apply successfully, re-configure using the following playbook. Be sure the “team” variable is present in the
inventory
file.- reconfigure-fah.yml
--- - hosts: all tasks: - name: Install FaH config template: src: /home/ansible/my-project/fah/roles/fahclient/templates/sample-config.xml.j2 dest: /etc/fahclient/config.xml - name: Restart FaH systemd: name: FAHClient state: restarted
- You might need to reboot the NUCs, not just the service
ansible clients -m reboot
Work with FAH Commands
- Check points per day (PPD) and queue information:
ansible clients -a "FAHClient --send-command ppd"
ansible clients -a "FAHClient --send-command queue-info"
- Tell all nodes to finish their work unit then pause
ansible -i ../hosts all -a "FAHClient --send-command finish"
- It's good form to finish the work units that are assigned to you before removing FAH from the nodes
- Pausing and unpausing folding
ansible clients -a "FAHClient --send-pause"
ansible clients -a "FAHClient --send-unpause"
Check Queue State
- Check the FAHClient queue states
ansible-playbook checkfahqueues.yml
- checkfahqueues.yml
--- - name: Check queue hosts: clients remote_user: ansible become: true tasks: - name: Gather queue information shell: "FAHClient --send-command queue-info" register: fahqueue changed_when: false - name: Queue status debug: msg: "{{(fahqueue.stdout_lines[4:-1] | join | from_json)[0].state }}"
- If the queue is empty, the test will fail
- If the node has paused folding, the status will be “READY”
- If the node is currently folding, the status will be “RUNNING”
Reconfigure FAH Clients
You can edit the inventory
rule and re-apply using the following playbook. Be sure the “team” variable is present in the inventory
file.
- reconfigure-fah.yml
--- - hosts: all tasks: - name: Install FaH config template: src: /home/ansible/my-project/fah/roles/fahclient/templates/sample-config.xml.j2 dest: /etc/fahclient/config.xml - name: Restart FaH systemd: name: FAHClient state: restarted
Check CPU Utilization
Check the CPU load on the nodes
ansible-playbook getcpu.yml
- getcpu.yml
--- - hosts: all gather_facts: false tasks: - name: Get CPU usage shell: "top -b -n 1" register: top changed_when: false - name: Set CPU usage facts set_fact: user_cpu: "{{ top.stdout_lines[2].split()[1] }}" system_cpu: "{{ top.stdout_lines[2].split()[3] }}" nice_cpu: "{{ top.stdout_lines[2].split()[5] }}" - name: Output CPU usage facts debug: msg: - "User CPU usage: {{ user_cpu }}" - "System CPU usage: {{ system_cpu }}" - "Nice CPU usage: {{ nice_cpu }}"
Check Temperature
In this example we will look into monitoring the CPU and chipset temperature of our NUCs.
Install lm-sensors
- Option 1 - Ad Hoc
ansible -i hosts all -m apt -a "name=lm-tools state=present"
- Option 2 - Playbook in /home/ansible/my-project/fah/lm-sensors.yml
- lm-sensors.yml
--- - name: lm-sensors install hosts: clients remote_user: ansible become: true tasks: - name: Install lm-sensors apt: name: lm-sensors update_cache: true - name: Detect sensors ansible.builtin.command: sensors-detect --auto
ansible-playbook lm-sensors.yml
Check Temperature
ansible clients -a sensors
ansible clients -a sensors -j
- check-temps.yml
--- - name: Check temperature hosts: clients remote_user: ansible become: true tasks: - name: Gather CPU temperature shell: "sensors | grep 'Package id 0:' | cut -c17-20" register: temp changed_when: false - name: Check CPU temperature fail: msg: "{{ temp.stdout }}" when: (temp.stdout | int > 80)
See this link for more information on using this sensors information with Ansible.