UncleNUC Wiki

Second chance for NUCs

User Tools

Site Tools


lab:stack_of_nucs:ansible_playbook_-_fah_installation

This is an old revision of the document!


Ansible Playbook - FAH Installation

In our previous step we checked the health of CMOS batteries on our Stack of NUCs.

Now we are going to create and run an Ansible playbook to set up Folding at Home (FAH) on the nodes. I have updated the playbook by ajacocks to add the current release and hack up a quick fix.

Please note that the NUCs in am using this lab have only 4 cores, and for some WU's (work units) the client will only use 3 cores. So don't expected to be scoring many points with these small boxes.

Purpose:

  • Demonstrate a running a complex workload of a service combined with configuration files

References

Step 1 - Install the fahcontrol app on NUC 1

The official download here does not work with Ubuntu 22.04. Use https://github.com/cdberkstresser/fah-control.

  1. Open a shell on NUC 1
  2. Install packages
    • sudo apt-get install -y python3-stdeb python3-gi python3-all python3-six debhelper dh-python gir1.2-gtk-3.0
  3. Clone the repo and run the command

Step 2 - Install the the FAH client using Ansible

From NUC 1, log in to the Ansible control node, NUC 2.

  1. Change directory to /home/ansible/my-project
  2. git clone --branch support-7-6-21 https://github.com/doritoes/fah.git
  3. Change directory to /home/ansible/my-project/fah
  4. Modify file /home/ansible/my-project/fah/inventory
    • copy your ansible node IPs from the file /home/ansible/my-project/hosts to the [clients] section
    • chost='(IP of Control Node)'
    • cpass='(control-node-password)'
    • username='(Yourname @ folding@home)'
    • team='(if you support a team)'
    • passkey='(redacted passkey from folding@home)'
  5. ansible-playbook main.yml
    • if you encounter a DNS lookup failure on some or all nodes
      • your wireless router should be setting DNS information as part of DHCP
      • did you disable the DNS stub resolver in earlier steps?
    • if you cannot connect with the control app and/or you see an error regarding a locked database
      • reboot the node to clear the error
      • it seems running the playbook on an already configured system and run multiple copies of FAH and cause the problem; rebooting solves the issue
  6. Reboot all the clients to ensure the service registers properly and no double processes are running
    • ansible clients -m reboot
    • If you want to confirm your FAH configuration copied correctly, see the optional section below
  7. On NUC 1, open the FAH control program
    • Add clients one at a time in FAHControl
      • Any name you want
      • IP address of the client
      • Control password you used configuring FAH

Next Step

Congratulations! Your Stack of NUCs is now fully occupied running a valuable workload! Next up is Ansible Playbook - FAH Removal, where we disable FAH and remove it.

Optional

Check FAH Status

  1. Check the FAHClient status
    • ansible-playbook checkfahstatus.yml
    • checkfahstatus.yml
      ---
      - hosts: clients
        become: true
        become_user: root
        tasks:
          - name: Get FAH service Status
            ansible.builtin.systemd:
              state: "started"
              name: "FAHClient"
            register: fah_service_status
          - name: Show status
            debug:
              msg: "{{ fah_service_status.status.ActiveState }}"
  2. Check the config file
    • check-fah-config.yml
      cat check-fah-config.yml
      ---
      - hosts: clients
        become: true
        become_user: root
        tasks:
          - name: Read FAH client from config.xml
            shell: cat /etc/fahclient/config.xml
            register: configuration
          - name: Dump configuration
            debug:
              var: configuration
  3. Check points per day (PPD) and queue information:
    • ansible clients -a "FAHClient --send-command ppd"
    • ansible clients -a "FAHClient --send-command queue-info"
  4. Tell all nodes to finish their work unit then pause
    • ansible -i ../hosts all -a "FAHClient --send-command finish"
    • It's good form to finish the work units that are assigned to you before removing FAH from the nodes
  5. Pausing and unpausing folding
    • ansible clients -a "FAHClient --send-pause"
    • ansible clients -a "FAHClient --send-unpause"

Check Queue State

  1. Check the FAHClient queue states
    • ansible-playbook checkfahqueues.yml
    • checkfahqueues.yml
      ---
      - name: Check queue
        hosts: clients
        remote_user: ansible
        become: true
        tasks:
          - name: Gather queue information
            shell: "FAHClient --send-command queue-info"
            register: fahqueue
            changed_when: false
          - name: Queue status
            debug:
              msg: "{{(fahqueue.stdout_lines[4:-1] | join | from_json)[0].state }}"
  2. If the queue is empty, the test will fail
  3. If the node has paused folding, the status will be “READY”
  4. If the node is currently folding, the status will be “RUNNING”

Reconfigure FAH Clients

You can edit the inventory rule and re-apply using the following playbook. Be sure the “team” variable is present in the inventory file.

reconfigure-fah.yml
---
- hosts: all
  tasks:
    - name: Install FaH config
      template:
        src: /home/ansible/my-project/fah/roles/fahclient/templates/sample-config.xml.j2
        dest: /etc/fahclient/config.xml
    - name: Restart FaH
      systemd:
        name: FAHClient
        state: restarted

Check CPU Utilization

Check the CPU load on the nodes

  • ansible-playbook getcpu.yml
  • getcpu.yml
    ---
    - hosts: all
      gather_facts: false
      tasks:
        - name: Get CPU usage
          shell: "top -b -n 1"
          register: top
          changed_when: false
        - name: Set CPU usage facts
          set_fact:
            user_cpu: "{{ top.stdout_lines[2].split()[1] }}"
            system_cpu: "{{ top.stdout_lines[2].split()[3] }}"
            nice_cpu: "{{ top.stdout_lines[2].split()[5] }}"
        - name: Output CPU usage facts
          debug:
            msg:
              - "User CPU usage: {{ user_cpu }}"
              - "System CPU usage: {{ system_cpu }}"
              - "Nice CPU usage: {{ nice_cpu }}"

Check Temperature

In this example we will look into monitoring the CPU and chipset temperature of our NUCs.

Install lm-sensors

  • Option 1 - Ad Hoc
    • ansible -i hosts all -m apt -a "name=lm-tools state=present"
  • Option 2 - Playbook in /home/ansible/my-project/fah/lm-sensors.yml
    • lm-sensors.yml
      ---
      - name: lm-sensors install
        hosts: clients
        remote_user: ansible
        become: true
        tasks:
          - name: Install lm-sensors
            apt:
              name: lm-sensors
              update_cache: true
          - name: Detect sensors
            ansible.builtin.command: sensors-detect --auto
    • ansible-playbook lm-sensors.yml

Check Temperature

  • ansible clients -a sensors
  • ansible clients -a sensors -j
check-temps.yml
---
- name: Check temperature
  hosts: clients
  remote_user: ansible
  become: true
  tasks:
    - name: Gather CPU temperature
      shell: "sensors | grep 'Package id 0:' | cut -c17-20"
      register: temp
      changed_when: false
    - name: Check CPU temperature
      fail:
        msg: "{{ temp.stdout }}"
      when: (temp.stdout | int > 80)

See this link for more information on using this sensors information with Ansible.

lab/stack_of_nucs/ansible_playbook_-_fah_installation.1684461161.txt.gz · Last modified: 2023/05/19 01:52 by user