Developer Guide

  • 2021.2
  • 06/11/2021
  • Public
Contents

Troubleshooting

This topic covers troubleshooting for the data streams optimizer.
Problem
Possible Cause / Solution
Environment file parsing error due to mis-formatted JSON.
Cause: File contains one or more invalid JSON characters.
Solution: Escape JSON-specific characters such as double-quotes and backslash:
  • For double-quotes
    "
    , replace with escape symbol:
    \"
  • For backslash
    \
    , replace with escape symbol:
    \\
Workload exit status is 127.
Cause: Permission issue.
Solution: Make sure you have “execute” permissions for the workload validation script.
Capsule generation script exit status is 127.
Cause: Permission issue.
Solution: Make sure you have “execute” permissions for the subregion capsule script (subregion_capsule.py).
“no module named …”
Cause: Prerequisites are not satisfied.
Solution: Follow the steps in the prerequisites section of Readme.md.
Failed to reconnect via SSH after reboot.
Cause: IP address changes after reboot.
Solution: Use a static IP address for the target system or use the full hostname to establish the SSH connection.
“Failed to generate capsule.”
Cause: Subregion capsule tool issue.
Solution: Check that instructions from
${TCC_TOOLS_PATH}/capsule
were executed correctly and check paths in environment file.
Some streams are missing in the tool flow log.
Cause: In the requirements file, these streams lack unique IDs.
Solution: In the requirements file, make sure that each tccRequirements field has a unique ID.
The data streams optimizer hangs after the “Rebooting <hostname>” output message during target reboot.
Cause: Most likely you have a “Broken pipe” issue in the case of an unexpected exit from the SSH session to the target system.
Solutions:
  • Fix the “Broken pipe” issue (may be SSH settings or network issue).
    1. Fix the connection issue by reviewing your IP addresses, connection settings, and cable connections.
    2. Review SSH settings.
  • Or change the reconnection timeout and reboot settings in the environment file:
    1. Increase
      reconnection_timeout
      to 70.
    2. Use
      shutdown -r 1
      instead of
      reboot
      command.
After trying a solution, rerun the tuning flow from the beginning.
On 11th Gen Intel® Core™ processors, a system hang may occur intermittently when running the
reboot
command.
Cause: If the system detects hardware errors, the Functional Safety (FuSa) feature, PCIe Interrupt Error Handling (IEH), may attempt an additional system reset which can get stuck at postcode 0x0b7f.
Solution: Hard reset to regain control of the system.
Temporary resolution for system hang after reboot: Disable IEH in the BIOS menu: Intel Advanced Menu/PCH-IO Configuration/IEH Mode = Bypass Mode
The MRL application freezes on “Start validation.”
Cause: Application may freeze due to a high volume of interrupts.
Solution: Increase the
--outliers
argument to 400 or higher. It is an optional argument, so you may need to add it to the command.
The MRL application cannot detect the processor automatically.
Cause: Your processor name does not correspond with any processor in the list of known processors.
Solution: Add the
--processor {TGL-U|EHL}
argument to the workload command in the requirements file. With this option, you can specify your processor manually. Using unsupported processors can cause errors and odd results.
The MRL application shows “Unable to mmap memory”..
Cause: Some drivers blocks /dev/mem from using
Solution: Unload the
stmmac_pci
and
stmmac
drivers if you are using a TSN device, or the
igc
driver if you are using an I225 device.
The data streams optimizer applied a tuning configuration to your system, but you want to reset your system to default settings.
Solution: Disable the
Data Streams Optimizer
,
Software SRAM
, and
Intel® TCC Mode
options in the BIOS. After reboot, your system will be reset.
After disabling RTCM, the system freezes or the following error occurs: “Could not set up firmware update: Invalid argument. ERROR: Failed to apply buffer capsule”.
Unexpected performance results occur,
fwupdate
software does not work so capsules are not applying, or enable/disable RTCM does not work.
Some cores are offline according to the
lscpu
command output. Output example: “Off-line CPU(s) list: 1-3.”
Cause: Combining RTCM and data streams optimizer may result in offline cores and a number of different errors.
Possible solutions:
Solution 1: Reflash the BIOS.
Solution 2:
  1. Disable the
    Data Streams Optimizer
    option in the BIOS. Reboot.
  2. Disable RTCM. Reboot.
  3. Re-run the data streams optimizer with the “SoftwareSRAM” compatibility option set in the requirements file.
    Note:
    The performance effect of the data streams optimizer will not be visible, because tuning configurations will not be applied by the BIOS. Stop the tuning process after the first applied configuration.
  4. Enable the
    Data Streams Optimizer
    option in the BIOS. Reboot.
  5. Enable RTCM. Reboot.
Now your system is ready to use the data streams optimizer with enabled RTCM.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.