lab:stack_of_nucs:ansible_playbook_-_fah_installation
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
lab:stack_of_nucs:ansible_playbook_-_fah_installation [2023/05/19 01:57] – [Check FAH Status] user | lab:stack_of_nucs:ansible_playbook_-_fah_installation [2024/05/06 02:10] (current) – removed user | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Ansible Playbook - FAH Installation ====== | ||
- | In our previous step we [[Ansible Playbook - CMOS|checked the health of CMOS batteries]] on our [[start|Stack of NUCs]]. | ||
- | Now we are going to create and run an Ansible playbook to set up [[https:// | ||
- | |||
- | Please note that the NUCs in am using this lab have only 4 cores, and for some WU's (work units) the client will only use 3 cores. So don't expected to be scoring many points with these small boxes. | ||
- | |||
- | Purpose: | ||
- | * Demonstrate a running a complex workload of a service combined with configuration files | ||
- | |||
- | References | ||
- | * [[https:// | ||
- | ====== Step 1 - Install the fahcontrol app on NUC 1 ====== | ||
- | The official download [[https:// | ||
- | |||
- | - Open a shell on [[NUC 1]] | ||
- | - Install packages | ||
- | * '' | ||
- | - Clone the repo and run the command | ||
- | * '' | ||
- | * '' | ||
- | * '' | ||
- | |||
- | ====== Step 2 - Install the the FAH client using Ansible ====== | ||
- | From [[NUC 1]], log in to the Ansible control node, [[NUC 2]]. | ||
- | |||
- | - Change directory to / | ||
- | - < | ||
- | - Change directory to ''/ | ||
- | - Modify file ''/ | ||
- | * copy your ansible node IPs from the file / | ||
- | * chost=' | ||
- | * cpass=' | ||
- | * username=' | ||
- | * team=' | ||
- | * passkey=' | ||
- | - '' | ||
- | * if you encounter a DNS lookup failure on some or all nodes | ||
- | * your wireless router should be setting DNS information as part of DHCP | ||
- | * did you disable the DNS stub resolver in earlier steps? | ||
- | * if you cannot connect with the control app and/or you see an error regarding a locked database | ||
- | * reboot the node to clear the error | ||
- | * it seems running the playbook on an already configured system and run multiple copies of FAH and cause the problem; rebooting solves the issue | ||
- | - Reboot all the clients to ensure the service registers properly and no double processes are running | ||
- | * '' | ||
- | * If you want to confirm your FAH configuration copied correctly, see the optional section below | ||
- | - On [[NUC 1]], open the FAH control program | ||
- | * Add clients one at a time in FAHControl | ||
- | * Any name you want | ||
- | * IP address of the client | ||
- | * Control password you used configuring FAH | ||
- | |||
- | ====== Next Step ====== | ||
- | Congratulations! Your [[start|Stack of NUCs]] is now fully occupied running a valuable workload! Next up is [[Ansible Playbook - FAH Removal]], where we disable FAH and remove it. | ||
- | |||
- | ====== Optional ====== | ||
- | ===== Check FAH Status ===== | ||
- | - Check the FAHClient status | ||
- | * '' | ||
- | * <file yaml checkfahstatus.yml> | ||
- | --- | ||
- | - hosts: clients | ||
- | become: true | ||
- | become_user: | ||
- | tasks: | ||
- | - name: Get FAH service Status | ||
- | ansible.builtin.systemd: | ||
- | state: " | ||
- | name: " | ||
- | register: fah_service_status | ||
- | - name: Show status | ||
- | debug: | ||
- | msg: "{{ fah_service_status.status.ActiveState }}" | ||
- | </ | ||
- | - Check the config file | ||
- | * <file yaml check-fah-config.yml> | ||
- | cat check-fah-config.yml | ||
- | --- | ||
- | - hosts: clients | ||
- | become: true | ||
- | become_user: | ||
- | tasks: | ||
- | - name: Read FAH client from config.xml | ||
- | shell: cat / | ||
- | changed_when: | ||
- | register: configuration | ||
- | - name: Dump configuration | ||
- | debug: | ||
- | var: configuration.stdout_lines | ||
- | </ | ||
- | - If the configuration did not apply successfully, | ||
- | * <file yaml dabba.yml> | ||
- | </ | ||
- | - Check points per day (PPD) and queue information: | ||
- | * <code bash> | ||
- | * <code bash> | ||
- | - Tell all nodes to finish their work unit then pause | ||
- | * <code bash> | ||
- | * It's good form to finish the work units that are assigned to you before removing FAH from the nodes | ||
- | - Pausing and unpausing folding | ||
- | * <code bash> | ||
- | * <code bash> | ||
- | |||
- | ===== Check Queue State ===== | ||
- | - Check the FAHClient queue states | ||
- | * '' | ||
- | * <file yaml checkfahqueues.yml> | ||
- | --- | ||
- | - name: Check queue | ||
- | hosts: clients | ||
- | remote_user: | ||
- | become: true | ||
- | tasks: | ||
- | - name: Gather queue information | ||
- | shell: " | ||
- | register: fahqueue | ||
- | changed_when: | ||
- | - name: Queue status | ||
- | debug: | ||
- | msg: " | ||
- | </ | ||
- | - If the queue is empty, the test will fail | ||
- | - If the node has paused folding, the status will be " | ||
- | - If the node is currently folding, the status will be " | ||
- | ===== Reconfigure FAH Clients ===== | ||
- | You can edit the '' | ||
- | |||
- | <file yaml reconfigure-fah.yml> | ||
- | --- | ||
- | - hosts: all | ||
- | tasks: | ||
- | - name: Install FaH config | ||
- | template: | ||
- | src: / | ||
- | dest: / | ||
- | - name: Restart FaH | ||
- | systemd: | ||
- | name: FAHClient | ||
- | state: restarted | ||
- | </ | ||
- | |||
- | ===== Check CPU Utilization ===== | ||
- | Check the CPU load on the nodes | ||
- | * '' | ||
- | * <file yaml getcpu.yml> | ||
- | --- | ||
- | - hosts: all | ||
- | gather_facts: | ||
- | tasks: | ||
- | - name: Get CPU usage | ||
- | shell: "top -b -n 1" | ||
- | register: top | ||
- | changed_when: | ||
- | - name: Set CPU usage facts | ||
- | set_fact: | ||
- | user_cpu: "{{ top.stdout_lines[2].split()[1] }}" | ||
- | system_cpu: "{{ top.stdout_lines[2].split()[3] }}" | ||
- | nice_cpu: "{{ top.stdout_lines[2].split()[5] }}" | ||
- | - name: Output CPU usage facts | ||
- | debug: | ||
- | msg: | ||
- | - "User CPU usage: {{ user_cpu }}" | ||
- | - " | ||
- | - "Nice CPU usage: {{ nice_cpu }}" | ||
- | </ | ||
- | |||
- | ===== Check Temperature ===== | ||
- | In this example we will look into monitoring the CPU and chipset temperature of our NUCs. | ||
- | |||
- | Install lm-sensors | ||
- | * Option 1 - Ad Hoc | ||
- | * < | ||
- | * Option 2 - Playbook in / | ||
- | * <file yaml lm-sensors.yml> | ||
- | --- | ||
- | - name: lm-sensors install | ||
- | hosts: clients | ||
- | remote_user: | ||
- | become: true | ||
- | tasks: | ||
- | - name: Install lm-sensors | ||
- | apt: | ||
- | name: lm-sensors | ||
- | update_cache: | ||
- | - name: Detect sensors | ||
- | ansible.builtin.command: | ||
- | </ | ||
- | * '' | ||
- | |||
- | Check Temperature | ||
- | * '' | ||
- | * '' | ||
- | |||
- | <file yaml check-temps.yml> | ||
- | --- | ||
- | - name: Check temperature | ||
- | hosts: clients | ||
- | remote_user: | ||
- | become: true | ||
- | tasks: | ||
- | - name: Gather CPU temperature | ||
- | shell: " | ||
- | register: temp | ||
- | changed_when: | ||
- | - name: Check CPU temperature | ||
- | fail: | ||
- | msg: "{{ temp.stdout }}" | ||
- | when: (temp.stdout | int > 80) | ||
- | </ | ||
- | |||
- | See [[https:// |
lab/stack_of_nucs/ansible_playbook_-_fah_installation.1684461455.txt.gz · Last modified: 2023/05/19 01:57 by user