hcimarkus

Mittwoch, 13. März 2024

Show HPE iLO Advanced Key

The ilo advanced key is not displayed completely in the HTML interface and is censored by XXXX

You can display it as xml with the following URL:

https://IP-ADDRESS/xmldata?item=cpqkey

Montag, 20. November 2023

Adding / changing uplinks in vs0 virtual switch configuration failed at nutanix single node cluster

The update of a virtual switch vs0 failed at nutanix single node clusters because the cluster is unable to perform a rolling reboot. You have to convert the switch and change the uplinks via the old manage_ovs commands and convert it back:

Disable the cluster VS(No downtime)

acli net.disable_virtual_switch

Then upgrade the bridge via the CVM with the correct details:

Uplink with LACP:

i. manage_ovs --bridge_name br0 --interfaces <interface names> --bond_name br0-up --bond_mode balance-tcp --lacp_mode fast --lacp_fallback true update_uplinks

Uplink without LACP:

i. manage_ovs --bridge_name br0 --interfaces <interface names> --bond_name br0-up --bond_mode <active-backup/balanced-slb> update_uplinks

Then verify connectivity to the CVM and AHV host
Migrate the bridge back to a VS:

acli net.migrate_br_to_virtual_switch br0 vs_name=vs0

Freitag, 10. November 2023

Expanding Cluster different Vendor / License Class

If you got problems at expanding your cluster with nodes from different vendor, be carefull as this is not supported. Sometimes you have to do this to replace all nodes of a cluster with new ones from a different vendor, so do this only for migration purposes, never run a mixed vendor config in production environments!

(In our case, we replaced Nutanix NX-Nodes with HPE DX)

Maybe the expand fails, as the new vendor has a different license class , so you can edit the file:

/etc/nutanix/hardware_config.json on the nodes, you want to add:

In the section

"hardware_attributes":

you will find an entry

"license_class": "software_only",

Remove this entry on all nodes you want to add and perform a genesis restart on this node. (You have to use sudo to edit the file!)

After expanding the cluster and removing the old nodes, put this entry back in the hardware_config.json and perform an allssh genesis restart on the cluster.

Then everything should be fine.

Maybe you also get an error at the prechecks from test_cassandra_ssd_size_check even if your SSD-Sizes are fine according to KB-8842

Then maybe your /etc/nutanix/hcl.json file on the existing cluster doesn't contain the SSDs in the new nodes. In this case check the hcl.json on the new nodes if this contains the SSDs. In this case copy the hcl.json from one of the new nodes to all CVMs in the running cluster, perform allssh genesis restart at the cluster and try again.

(Copy the file somewhere to /tmp and then use sudo mv to get it to /etc/nutanix/)

Maybe, Licensing of the "converted" Cluster may not work as expected, as the cluster thinks , he is still on "license_class": "appliance".

You can check your downloaded csf for the license class of the Nodes, if this is wrong, engage Nutanix support, they will provide a script to check and to set:

python /home/nutanix/ncc/bin/license_config_zk_util.py --show=config

python /home/nutanix/ncc/bin/license_config_zk_util.py --convert_license_class --license_class=<software_only/appliance>

to fix this in PC Versions prior to PC pc.2023.1.0.2 .

Later Versions has the ability to use "ncli license update-license license-class="<software_only/appliance>".

After updating wait about 1 hour, then you can try licensing from PC again (check the csf, if the Nodes now show the correct license_class)

Sometimes after removing a node (or all, in case of renewing the hardware) you can't reach the virtual cluster ip (cvms are reachable) or you can't resolve alerts. In this case Prism leader did not change correctly to a new node. You can fix this by restarting Prism:

allssh genesis stop prism;cluster start

Mittwoch, 1. Februar 2023

Error installing SSL certifcate on VCenter 7.0.3

Error occurred while fetching tls: String index out of range: -1

If you've got this error while replacing the Certificate through the UI Interface, try to not use the "Browse file" dialog. Just open the certificate in Text editor and copy/paste it in the required field.

Mittwoch, 18. Januar 2023

HPE SPP Installation not possible via LCM on HPE DX Hosts

If somebody accidentaly updated Firmwares on a HPE DX Node manually, it won't be possible to install Firmwares (SPP) on the node via LCM, as Nutanix expects special Versions of SPP on the node. (In LCM hover over the Question mark an it shows you, wich Versions are supported for Update)

You can fix this and make LCM believe, you have such a version (but be careful, it should be a SPP-version, that reflects the FW-Versions, that are installed):

You should ssh to the affected host and then:

[root@host ~]# export TMP=/usr/local/tmpdir

[root@host ~]# ilorest --nologo login

[root@host ~]# ilorest --nologo select Bios. The dot is important!

[root@host ~]# ilorest --nologo get ServerOtherInfo

Now you can see, the Version of SPP the node has

Set the version to nothing, if something wrong is inside:

[root@host ~]# ilorest --nologo set ServerOtherInfo='' --commit

Set the version to the correct expected version:

[root@host ~]# ilorest --nologo set ServerOtherInfo='2021.04.0.02' --commit

Check, if everything is fine:

[root@host ~]# ilorest --nologo get ServerOtherInfo

[root@host ~]# ilorest --nologo logout

No you can perform a LCM Inventory and the SPP-Update should be possible

Mittwoch, 26. Oktober 2022

Cleanup /home partition CVM if kb1540 does not help

If the kb article 1540 at Nutanix to clean up the CVMs does not help, the clickstream folder can be cleaned up.

please run the following command on the affected cvm

find ~/data/prism/clickstream -name 'client_tracking*' -mmin +7200 -type f -exec /usr/bin/rm '{}' +

and cleanup /data/prism audit-logs with e.g. (in folder /data//prism)

/bin/rm api_audit-2022*

Dienstag, 5. Juli 2022

Modify Password Expiry for Nutanix admin

To view current password age for admin user, execute chage command as follows:

nutanix@cvm$ sudo chage -l admin

Last password change : May 22, 2007

Password expires : never

Password inactive : never

Account expires : never

Minimum number of days between password change : 0

Maximum number of days between password change : 99999

Number of days of warning before password expires : 7

To disable password aging / expiration for user admin , type command as follows and set:

Minimum Password Age to 0

Maximum Password Age to 99999

Password Inactive to -1

Account Expiration Date to -1

nutanix@cvm$ sudo chage -m 0 -M 99999 -I -1 -E -1 admin

You can even do this, if the GUI Logon wants you to change your admin password, just ssh to a CVM an execute the Command, then after a refresh , you can logon with your existing Password.

Donnerstag, 3. März 2022

Nutanix LCM Lifecycle Manager Catalog Cleanup

If LCM Firmwareupdates (i.g. HPE DX SPP Updates) cannot be applied, you can try the following steps:

1. Clear Browser Cache

if clearing does not solve the problem you can "reset" your lcm by catalog cleanup from any of the CVMs:

Run the following 4 commands one by one:

- python /home/nutanix/cluster/bin/lcm/lcm_catalog_cleanup

- python /home/nutanix/cluster/bin/lcm/lcm_catalog_cleanup --lcm_cleanup_cpdb

- allssh genesis stop catalog; cluster start

- cluster restart_genesis

After this, please retry performing inventory.

Montag, 14. Februar 2022

Cant' register vCenter in Nutanix

If you can't register vCenter in Nutanix as a wrong IP is discovered, check vpxa.cfg on the ESXi Hosts. Possibly you have to change the vpxa.cfg on the hosts. This mainly happens, after changing VCSA IP and/or hostname. First check, if vCenter can communicate with the hosts.

If Communication is established and it's not possible to permanently change the configuration or VMware changes back the configuration automatically, you may have a look on this setting in the VCSA (IP Address missing, Name wrong etc):

Donnerstag, 20. Januar 2022

Change VCSA FQDN and IP in a Nutanix Environment

Change VCSA FQDN and IP in a Nutanix Environment:

First, check everything is up an running and there are no problems at all.

Be sure, you have all needed credentials (SSO-Admin, Domain Admin. etc...)

Be sure you have the new fqdn and the new IP as well as the corresponding DNS-Entry. An eventually new VLAN should be available on your vSphere Environment.

First, take a backup (snapshot) of your VCSA.

Then uninstall vCenter Plugins an unregister your vCenter from your Nutanix PRISM:

Then leave the AD-Domain via the vSphere Webclient (logged in with SSO-Admin):

A reboot of the VCSA is required, you ca do this via VAMI (IP:5480) and the root user of the VCSA.

After the VCSA has rebooted and all services are started (Check via VAMI), we can continue to change the FQDN andf IP of the VCSA, we will perform this in VAMI -> Networking:

Be sure, your new IP and FQDN are resolvable via DNS!

Check, everything is correct before you press finish.

Then a progess window appears:

You should ping the old vCenter IP, if there is no more answer, you can login to the ESX-hostclient of the ESX-Server, where the VCSA is running and change the VLAN of the VCSA-VM.

Now ping the new IP-adress and you should reach it. You should be redirected after some time to the new IP/FQDN of the VCSA. If not, wait about 10 minutes and connect to the new fqdn-VAMI and logon, then you should see again the progress bar:

After finishing all tasks and restarting all services, you will be redirected to the login page and have to reauthenticate:

Now, you can check within VAMI, that the VCSA has a new fqdn and ip and all services are up and running:

The you can switch to webclient and check, everything is fine, too.

Then you have to join the AD-domain from webclient:

You have to perform a reboot of the VCSA after joining the domain, you perform this from VAMI.

After restarting, check, everything is running, then you eventually have to register your plugins and register the new VCSA in Nutanix PRISM:

Now you should be done, just check, your backup and monitoring is aware of the "new" vCenter.

If you are using custom certificates, you have to recertificate your vSphere-environment.

If some services didn't come up after changing FQDN and IP, you can try recertificate the VCSA also with selfsigned certificates, if you still have problems, you can go back to your snapshot and try again!

If all is working as desired, delete your snapshot!

Mittwoch, 19. Januar 2022

Change RF-Mode of a Nutanix Storage Container

To check the Redundancy Factor of the Cluster:

ncli cluster get-redundancy-state

To check the Container Redundancy Factor:

ncli ctr ls|egrep -i "Name |Replication"

To change the RF-Mode of a Nutanix Storage Container, first show the Conatainer Details:

ncli ctr ls

Identify the correct Container and note the ID of it (all digits after the :: on the ID line)

For example:

ID : 00052c80-729d-8761-000000052fb::1085

Change the Conatiner RF Mode:

ncli ctr edit rf=RF-Mode id=Container_ID (force=true)

For example:

ncli ctr edit rf=3 id=1085

For the NutanixManagementShare Container, you need the additional force=true Parameter

Montag, 17. Januar 2022

Set Active Network Interface in AHV active-backup configurations

To check which NIC is active please connect to the AHV host, run the following command:

[root@ahv ~]# ovs-appctl bond/show

In the command output, the active interface will be marked as an active slave.

To change active NIC please connect to the AHV host and run the following command

this sets the active interface!:

[root@ahv ~]# ovs-appctl bond/set-active-slave <bond name> <interface name>

Example:

[root@ahv ~]# ovs-appctl bond/set-active-slave br0-up eth2

This sets eth2 as active interface

You can perform this from CVM with hostssh "Command" for the whole Cluster to set all Active links on one physical Switch if needed.

IPMI (ILO) Configuration via ipmitool

AHV

To Reset IPMI via ipmitool (if not responding etc)

you can connect to an AHV Host and then

ipmitool mc reset cold

or you can do this for the whole cluster from CVM with:

hostssh "ipmitool mc reset cold"

This resets all IPMIs in the cluster.

To add an user you can check via

ipmitool user list 1

wich userspace is unused.

then

** User create on 3

ipmitool user set name 3 ILOUSER

** User set password for 3

ipmitool user set password 3 PASSWORD

** User set Admin privileges (4)

ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=4

Possible privilege levels are:

   1   Callback level

   2   User level

   3   Operator level

   4   Administrator level

   5   OEM Proprietary level

  15   No access

**Enable User 3

ipmitool user enable 3

You can use all of these commands from CVM with hostssh in " "

to set this for the whole cluster.

Example hostssh "user set name 3 USER"

ESXi

On an ESXi Host you can use ./ipmicfg wich can do most in one command:

hostssh "./ipmicfg -user add <ID> <NAME> <PASSWORD> <PRIVILEGE>"

Example:

hostssh "./ipmicfg -user add 3 ADMIN P@ssw0rd 4"

To reset the BMC use:

./ipmicfg -r

Disable HA high availability Nutanix AHV Cluster

To disable HA completely on the cluster, use the following command from nutanix user of any CVM in that cluster:

acli ha.update enable_failover=0
acli ha.get

After running acli ha.get, you should see 'ha_state: "kAcropolisHADisabled"' and also on Prism dashboard under VM Summary widget.

To re-enable HA, please use the following command:

acli ha.update enable_failover=1
acli ha.get

This command disables HA on the complete cluster and not on individual node. It is recommended to disable HA only if it is absolutely required.

Dienstag, 21. Dezember 2021

Access VCSA with WinSCP

1. Access to the (VCSA) the Appliance Management. https://vcsa_ip:5480

2. Navigate to Access and click Edit under Access Settings.

3. Switch on Enable SSH Login and click OK.

4. Access to the VCSA via SSH.

5. Type: Shell

6. Change the default Shell to Bash typing: chsh -s /bin/bash root

Now you should be able to access VCSA with WinSCP with the following settings:

Use "scp" on port 22.

(tested with VCSA 6.7)

Freitag, 17. Dezember 2021

Manually change (lower) CVM Memory in a Nutanix AHV-Cluster

First check, that everything is fine in your cluster.

To manually change the amount of Memory of CVM, first logon to the CVM and shutdown the CVM via the cvm_shutdown script:

cvm_shutdown -P

Then logon to the correspondig AHV Host and list all VMs:

virsh list --all

Check, that you see your CVM powered off

then continue to change the Memory config:

virsh setmem <CVM_NAME> <Gigabytes>G --config

virsh setmaxmem <CVM_NAME> <Gigabytes>G --config

Example: virsh setmem NTNX-CVM1 36G --config

You can check if this succeeded:

virsh dominfo <CVM_NAME>

then you can power on the CVM again:

virsh start <CVM_NAME>

Wait for the CVM to come up and go to the GUI and wait for everything gettin normal (especially the Data Resiliency Status should change back to OK).

After everything is ok, you can continue with the next CVM.

Dienstag, 12. Oktober 2021

How to change Controller VM and Host IP address

The only way is to use the external ip reconfiguration script. (external_ip_reconfig)

You can use the external IP address reconfiguration script in the following scenarios:

Change the IP addresses of the CVMs in the same subnet.
Change the IP addresses of the CVMs to a new or different subnet.
In this scenario, the external IP address reconfiguration script works successfully if the new subnet is configured with the required switches and the CVMs can communicate with each other in the new subnet.
Change the IP addresses of the CVMs to a new or different subnet if you are moving the cluster to a new physical location.
In this scenario, the external IP address reconfiguration script works successfully if the CVMs can still communicate with each other in the old subnet.

Following is the summary of steps that you must perform to change the IP addresses on a Nutanix cluster.

Check the health of the cluster infrastructure and resiliency (For more information, see the Before you begin section of this document.)
Stop the cluster.
Change the VLAN and NIC Teaming configurations as necessary.

Note: Check the connectivity between CVMs and hosts, that is all the hosts must be reachable from all the CVMs and vice versa before you perform step 4. If any CVM or host is not reachable, contact Nutanix Support for assistance.

Change the CVM IP addresses by using the external_ip_reconfig script.
Change the hypervisor host IP addresses if necessary.
Restart the CVMs.
Perform the initial series of validation steps.
Start the cluster.
Perform the final series of validation steps.
Change the IPMI IP addresses if necessary.

Note: Please be sure to follow the steps in the order given

The external IP address reconfiguration script performs the following tasks:

Checks if the cluster is stopped.
Puts the cluster in reconfiguration mode.
Restarts Genesis.
Prompts you to type the new netmask, gateway, and external IP addresses, and updates them.
Updates the IP addresses of the Zookeeper hosts.

For detailed information on each step for automated method. Please refer to the following portal document:

https://portal.nutanix.com/page/documents/details?targetId=Advanced-Setup-Guide-AOS-v5_18:ipc-cvm-ip-address-reconfigure-t.html

The host IPs can be changed directly from the DCUI if the hypervisor is ESXi and if the hypervisor is AHV please follow the process mentioned in the following document:

https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_18:ahv-acr-host-ip-address-change-t.html

Freitag, 3. September 2021

Nutanix NGT installation failed with cmd.exe /C ""c:\Program Files\Nutanix\Python36\python.exe" -E "c:\Program Files\Nutanix\ssr\ssr_gateway\ssr_gateway_service.py" install"

1. Check the installed softwares and check of NGT was installed before being removed

2. check using powershell if the NGT Infra Component Packages are still present after removing the "old" NGT:

Get-wmiobject -class win32_product -filter "Name = 'Nutanix Guest Tools Infrastructure Components Package 1'"

then remove it:

Get-Package -Name "Nutanix Guest Tools Infrastructure Components Package 1" | Uninstall-Package -Force

3. clean the registry in HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall and "find" Nutanix - delete the key.

4. Reinstall NGT

Montag, 8. Februar 2021

How to delete aged third party snapshots nutanix ahv

You will receive the following error message in Prism Element:

System has x aged third-party backup snapshot(s) and they may unnecessarily consume storage space in the cluster.

The problem occurs if snaps are not removed successfully by the backup.
You have to remove the snaps by a script from one of the cvms:

change directory:
cd /home/nutanix/bin

download the latest script:

wget http://download.nutanix.com/kbattachments/3768/backup_snapshots.py

list snapshots:

python backup_snapshots.py --list_all admin

delete snaps (in my case older than 7 days)
python backup_snapshots.py --delete_snapshot_older_than 7 admin

If you got an error running the script, please check if:

-> the admin password do not user special characeters

-> the directory is /home/nutanix/bin

Montag, 7. Dezember 2020

Setup IPMI , so incorrect attemps do not lock the IPMI

To show the Configuration (from any CVM):

hostssh "./ipmitool lan print 1"

apply settings that IPMI does not get locked:

hostssh "./ipmitool lan set 1 bad_pass_tresh 0 0 0 0"

Donnerstag, 26. November 2020

Uninstall NVIDIA driver package from AHV

you can use following command to remove the NVIDIA driver from ahv with the following command:

yum remove NVIDIA-vGPU-ahv-2019-440.121.x86_64

please replace the NVIDIA-vGPU-ahv-2019-440.121.x86_64 with your current installed driver

Montag, 30. März 2020

vCenter Repoint Domain for VCSA with embedded PSC > 6.7U1

There is a possibility to repoint an existing vCenter with an embedded psc to another vcenter, so that is using the same vsphere.local domain. You have to start this command from the vcenter you would like to repoint (not the "target" vCenter).

cmsso-util domain-repoint -m execute --src-emb-admin Administrator --replication-partner-fqdn vCenter1.domain.intern --replication-partner-admin Administrator --dest-domain-name vsphere.local

Enter Source embedded vCenter Server Admin Password :
Enter Replication partner Platform Services Controller Admin Password :

The domain-repoint operation will export License, Tags, Authorization data
before repoint and import after repoint.

WARNING: Global Permissions for the source vCenter Server system will be lost. The
administrator for the target domain must add global permissions manually.
Source domain users and groups will be lost after the Repoint operation.
User 'Administrator@vsphere.local' will be assigned administrator role on the
source vCenter Server system.

The default resolution mode for Tags and Authorization conflicts is Copy,
unless overridden in the conflict files generated during pre-check.

Solutions and plugins registered with vCenter Server must be re-registered.

Before running the Repoint operation, you should backupof all nodes
including external databases. You can use file based backups to restore in
case of failure. By using the Repoint tool you agree to take the responsibility
for creating backups, otherwise you should cancel this operation.

Starting with vSphere 6.7, VMware announced a simplified vCenter Single Sign-On
domain architecture by enabling vCenter Enhanced Linked Mode support for
vCenter Server Appliance installations with an embedded Platform Services
Controller. You can use the vCenter Server converge utility to change the
deployment topology from an external Platform Services Controller to an
embedded Platform Services Controller with support for vCenter Enhanced Linked
Mode. As of this release, the external Platform Services Controller
architecture is deprecated and will not be available in future releases. For
more information, see https://kb.vmware.com/s/article/60229

The following license keys are being copied to the target Single Sign-On
domain. VMware recommends using each license key in only a single domain. See
"vCenter Server Domain Repoint License Considerations" in the vCenter Server
Installation and Setup documentation.

Repoint Node Information:
Source embedded vCenter Server:vCenter2.domain.intern

Replication partner Platform Services Controller: vCenter1.domain.intern
Thumbprint: 58:B5:23:A4:F6:4H:BA:7C:07:00:8F:7A:7F:7A:A5:A5:3D:EB:51:C2

All Repoint configuration settings are correct; proceed? [Y|y|N|n]: y

Starting License export ... Done
Starting Authz Data export ... Done
Starting Tagging Data export ... Done
Export Service Data ... Done
Uninstalling Platform Controller Services ... Done
Stopping all services ... Done
Updating registry settings ... Done
Re-installing Platform Controller Services ... Done
Registering Infra services ... Done
Updating Service configurations ... Done
Starting License import ... Done
Starting Authz Data import ... Done
Starting Tagging Data import ... Done
Applying target domain CEIP participation preference ... Done
Starting all services ... Done
Repoint successful.

root@vCenter2 [ ~ ]#

Search