UncleNUC Wiki

Second chance for NUCs

User Tools

Site Tools


lab:stack_of_nucs:ansible_playbook_-_fah_installation

This is an old revision of the document!


Ansible Playbook - FAH Installation

In our previous step we checked the health of CMOS batteries on our Stack of NUCs.

Now we are going to create and run an Ansible playbook to set up Folding at Home (FAH) on the nodes. I have updated the playbook by ajacocks to add the current release and hack up a quick fix.

Please note that the NUCs in am using this lab have only 4 cores, and for some WU's (work units) the client will only use 3 cores. So don't expected to be scoring many points with these small boxes.

Current Issues:

  • working on improving the installation playbook (main.yml) so it doesn't report failures when the correct version is not selected for installation
  • working on improving terminating all instances when installing the fahclient.service
  • work on improving config.xml file file getting installed correctly

Purpose:

  • Demonstrate a running a complex workload of a service combined with configuration files

References

Step 1 - Install the fahcontrol app on NUC 1

The official download here does not work with Ubuntu 22.04. Use https://github.com/cdberkstresser/fah-control.

  1. Open a shell on NUC 1
  2. Install packages
    • sudo apt-get install -y python3-stdeb python3-gi python3-all python3-six debhelper dh-python gir1.2-gtk-3.0
  3. Clone the repo and run the command

Step 2 - Install the the FAH client using Ansible

From NUC 1, log in to the Ansible control node, NUC 2.

  1. Change directory to /home/ansible/my-project
  2. git clone --branch support-7-6-21 https://github.com/doritoes/fah.git
  3. Change directory to /home/ansible/my-project/fah
  4. Modify file /home/ansible/my-project/fah/inventory
    • copy your ansible node IPs from the file /home/ansible/my-project/hosts to the [clients] section
    • chost='(IP of Control Node)'
    • cpass='(control-node-password)'
    • username='(Yourname @ folding@home)'
    • team='(if you support a team)'
    • passkey='(redacted passkey from folding@home)'
  5. ansible-playbook main.yml
    • if you encounter a DNS lookup failure on some or all nodes
      • your wireless router should be setting DNS information as part of DHCP
      • did you disable the DNS stub resolver in earlier steps?
    • if you cannot connect with the control app and/or you see an error regarding a locked database
      • reboot the node to clear the error
      • it seems running the playbook on an already configured system and run multiple copies of FAH and cause the problem; rebooting solves the issue
  6. Reboot all the clients to ensure the service registers properly and no double processes are running
    • ansible clients -m reboot
    • If you want to confirm your FAH configuration copied correctly, see the optional section below
  7. On NUC 1, open the FAH control program
    • Add clients one at a time in FAHControl
      • Any name you want
      • IP address of the client
      • Control password you used configuring FAH

Next Step

Congratulations! Your Stack of NUCs is now fully occupied running a valuable workload! Next up is Ansible Playbook - FAH Removal, where we disable FAH and remove it.

Optional

Check FAH Status

  1. Check the FAHClient status
    • ansible-playbook check-fah-status.yml
    • check-fah-status.yml
      ---
      - hosts: clients
        become: true
        become_user: root
        tasks:
          - name: Get FAH service Status
            ansible.builtin.systemd:
              state: "started"
              name: "FAHClient"
            register: fah_service_status
          - name: Show status
            debug:
              msg: "{{ fah_service_status.status.ActiveState }}"
  2. Check the config file
    • check-fah-config.yml
      cat check-fah-config.yml
      ---
      - hosts: clients
        become: true
        become_user: root
        tasks:
          - name: Read FAH client from config.xml
            shell: cat /etc/fahclient/config.xml
            changed_when: false
            register: configuration
          - name: Dump configuration
            debug:
              var: configuration.stdout_lines
  3. If the configuration did not apply successfully, re-configure using the following playbook. Be sure the “team” variable is present in the inventory file.
    • reconfigure-fah.yml
      ---
      - hosts: all
        tasks:
          - name: Install FaH config
            template:
              src: /home/ansible/my-project/fah/roles/fahclient/templates/sample-config.xml.j2
              dest: /etc/fahclient/config.xml
          - name: Restart FaH
            systemd:
              name: FAHClient
              state: restarted
    • You might need to reboot the NUCs, not just the service
      • ansible clients -m reboot

Work with FAH Commands

  1. Check points per day (PPD) and queue information:
    • ansible clients -a "FAHClient --send-command ppd"
    • ansible clients -a "FAHClient --send-command queue-info"
  2. Tell all nodes to finish their work unit then pause
    • ansible -i ../hosts all -a "FAHClient --send-command finish"
    • It's good form to finish the work units that are assigned to you before removing FAH from the nodes
  3. Pausing and unpausing folding
    • ansible clients -a "FAHClient --send-pause"
    • ansible clients -a "FAHClient --send-unpause"

Check Queue State

  1. Check the FAHClient queue states
    • ansible-playbook check-fah-queues.yml
    • check-fah-queues.yml
      ---
      - name: Check queue
        hosts: clients
        remote_user: ansible
        become: true
        tasks:
          - name: Gather queue information
            shell: "FAHClient --send-command queue-info"
            register: fahqueue
            changed_when: false
          - name: Queue status
            debug:
              msg: "{{(fahqueue.stdout_lines[4:-1] | join | from_json)[0].state }}"
  2. If the queue is empty, the test will fail
  3. If the node has paused folding, the status will be “READY”
  4. If the node is currently folding, the status will be “RUNNING”

Check Work Unit ETAs

  1. Check the FAHClient unit ETAs
    • ansible-playbook check-fah-eta.yml
    • check-fah-eta.yml
      ---
      - name: Check queue ETA
        hosts: clients
        remote_user: ansible
        become: true
        tasks:
          - name: Gather queue information
            shell: "FAHClient --send-command queue-info"
            register: fahqueue
            changed_when: false
          - name: Queue status
            debug:
              msg: "{{(fahqueue.stdout_lines[4:-1] | join | from_json)[0].eta }}"

Check CPU Utilization

Check the CPU load on the nodes

  • ansible-playbook getcpu.yml
  • getcpu.yml
    ---
    - hosts: all
      gather_facts: false
      tasks:
        - name: Get CPU usage
          shell: "top -b -n 1"
          register: top
          changed_when: false
        - name: Set CPU usage facts
          set_fact:
            user_cpu: "{{ top.stdout_lines[2].split()[1] }}"
            system_cpu: "{{ top.stdout_lines[2].split()[3] }}"
            nice_cpu: "{{ top.stdout_lines[2].split()[5] }}"
        - name: Output CPU usage facts
          debug:
            msg:
              - "User CPU usage: {{ user_cpu }}"
              - "System CPU usage: {{ system_cpu }}"
              - "Nice CPU usage: {{ nice_cpu }}"

Check Temperature

In this example we will look into monitoring the CPU and chipset temperature of our NUCs.

  1. Install lm-sensors
    • Option 1 - Ad Hoc
      • ansible -i hosts all -m apt -a "name=lm-tools state=present"
    • Option 2 - Playbook in /home/ansible/my-project/fah/lm-sensors.yml
      • lm-sensors.yml
        ---
        - name: lm-sensors install
          hosts: clients
          remote_user: ansible
          become: true
          tasks:
            - name: Install lm-sensors
              apt:
                name: lm-sensors
                update_cache: true
            - name: Detect sensors
              ansible.builtin.command: sensors-detect --auto
    • ansible-playbook lm-sensors.yml
  2. Check Temperature
    • ansible clients -a sensors
    • ansible clients -a sensors -j
    • ansible-playbook check-temps.yml
      • check-temps.yml
        ---
        - name: Check temperature
          hosts: clients
          remote_user: ansible
          become: true
          tasks:
            - name: Gather CPU temperature
              shell: "sensors | grep 'Package id 0:' | cut -c17-20"
              register: temp
              changed_when: false
            - name: Check CPU temperature
              fail:
                msg: "{{ temp.stdout }}"
              when: (temp.stdout | int > 80)

See this link for more information on using this sensors information with Ansible.

lab/stack_of_nucs/ansible_playbook_-_fah_installation.1684607304.txt.gz · Last modified: 2023/05/20 18:28 by user