Files
2024-10-30 03:27:58 -04:00

408 lines
18 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org">
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii">
<title>Chapter&nbsp;9.&nbsp;Known Issues</title>
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1">
<link rel="start" href="index.html" title=
"NVIDIA Accelerated Linux Graphics Driver README and Installation Guide">
<link rel="up" href="installationandconfiguration.html" title=
"Part&nbsp;I.&nbsp;Installation and Configuration Instructions">
<link rel="prev" href="commonproblems.html" title=
"Chapter&nbsp;8.&nbsp;Common Problems">
<link rel="next" href="dma_issues.html" title=
"Chapter&nbsp;10.&nbsp;Allocating DMA Buffers on 64-bit Platforms">
</head>
<body>
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Chapter&nbsp;9.&nbsp;Known
Issues</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href=
"commonproblems.html">Prev</a>&nbsp;</td>
<th width="60%" align="center">Part&nbsp;I.&nbsp;Installation and
Configuration Instructions</th>
<td width="20%" align="right">&nbsp;<a accesskey="n" href=
"dma_issues.html">Next</a></td>
</tr>
</table>
<hr></div>
<div class="chapter" lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title"><a name="knownissues" id=
"knownissues"></a>Chapter&nbsp;9.&nbsp;Known Issues</h2>
</div>
</div>
</div>
<p>The following problems still exist in this release and are in
the process of being resolved.</p>
<div class="variablelist">
<p class="title"><b>Known Issues</b></p>
<dl>
<dt><span class="term">Cache Aliasing</span></dt>
<dd>
<p>Cache aliasing occurs when multiple mappings to a physical page
of memory have conflicting caching states, such as cached and
uncached. Due to these conflicting states, data in that physical
page may become corrupted when the processor's cache is flushed. If
that page is being used for DMA by a driver such as NVIDIA's
graphics driver, this can lead to hardware stability problems and
system lockups.</p>
<p>NVIDIA has encountered bugs with some Linux kernel versions that
lead to cache aliasing. Although some systems will run perfectly
fine when cache aliasing occurs, other systems will experience
severe stability problems, including random lockups. Users
experiencing stability problems due to cache aliasing will benefit
from updating to a kernel that does not cause cache aliasing to
occur.</p>
</dd>
<dt><span class="term">Valgrind</span></dt>
<dd>
<p>The NVIDIA OpenGL implementation makes use of self modifying
code. To force Valgrind to retranslate this code after a
modification you must run using the Valgrind command line
option:</p>
<pre class="screen">
--smc-check=all
</pre>
<p>Without this option Valgrind may execute incorrect code causing
incorrect behavior and reports of the form:</p>
<pre class="screen">
==30313== Invalid write of size 4
</pre>
<p></p>
</dd>
<dt><a name="msi_interrupts" id="msi_interrupts"></a><span class=
"term">Driver fails to initialize when MSI interrupts are
enabled</span></dt>
<dd>
<p>The Linux NVIDIA driver uses Message Signaled Interrupts (MSI)
by default. This provides compatibility and scalability benefits,
mainly due to the avoidance of IRQ sharing.</p>
<p>Some systems have been seen to have problems supporting MSI,
while working fine with virtual wire interrupts. These problems
manifest as an inability to start X with the NVIDIA driver, or CUDA
initialization failures. The NVIDIA driver will then report an
error indicating that the NVIDIA kernel module does not appear to
be receiving interrupts generated by the GPU.</p>
<p>Problems have also been seen with suspend/resume while MSI is
enabled. All known problems have been fixed, but if you observe
problems with suspend/resume that you did not see with previous
drivers, disabling MSI may help you.</p>
<p>NVIDIA is working on a long-term solution to improve the
driver's out of the box compatibility with system configurations
that do not fully support MSI.</p>
<p>MSI interrupts can be disabled via the NVIDIA kernel module
parameter "NVreg_EnableMSI=0". This can be set on the command line
when loading the module, or more appropriately via your
distribution's kernel module configuration files (such as those
under /etc/modprobe.d/).</p>
</dd>
<dt><a name="console_restore" id="console_restore"></a><span class=
"term">Console restore behavior</span></dt>
<dd>
<p>The Linux NVIDIA driver uses the nvidia-modeset module for
console restore whenever it can. Currently, the improved console
restore mechanism is used on systems that boot with the UEFI
Graphics Output Protocol driver, and on systems that use supported
VESA linear graphical modes. Note that VGA text, color index,
planar, banked, and some linear modes cannot be supported, and will
use the older console restore method instead.</p>
<p>When the new console restore mechanism is in use and the
nvidia-modeset module is initialized (e.g. because an X server is
running on a different VT, nvidia-persistenced is running, or the
nvidia_drm module is loaded with the <code class=
"computeroutput">modeset=1</code> parameter), then nvidia-modeset
will respond to hot plug events by displaying the console on as
many displays as it can. Note that to save power, it may not
display the console on all connected displays.</p>
</dd>
<dt><a name="vulkan_devices" id="vulkan_devices"></a><span class=
"term">Vulkan and device enumeration</span></dt>
<dd>
<p>Starting with the X.Org X server version 1.20.7, it is possible
to enumerate all the NVIDIA devices in the system if the
application is able to open a connection to the X server. However,
such applications will only be able to create an Xlib or XCB
swapchain on the device driving the X screen. Such a device can be
identified by using the vkGetPhysicalDeviceSurfaceSupportKHR()
API.</p>
<p>Prior to the X.Org X server version 1.20.7, it is not possible
to enumerate multiple devices if one of them will be used to
present to an X11 swapchain. It is still possible to enumerate
multiple devices even if one of them is driving an X screen, if the
devices will be used for Vulkan offscreen rendering or presenting
to a display swapchain. For that, make sure that the application
cannot open a display connection to an X server by, for example,
unsetting the DISPLAY environment variable.</p>
</dd>
<dt><a name="profiling" id="profiling"></a><span class=
"term">Restricting access to GPU performance counters</span></dt>
<dd>
<p>NVIDIA Developer Tools allow developers to debug, profile, and
develop software for NVIDIA GPUs. GPU performance counters are
integral to these tools. By default, access to the GPU performance
counters is restricted to root, and other users with the
CAP_SYS_ADMIN capability, for security reasons. If developers
require access to the NVIDIA Developer Tools, a system
administrator can accept the security risk and allow access to
users without the CAP_SYS_ADMIN capability.</p>
<p>Wider access to GPU performance counters can be granted by
setting the kernel module parameter
"NVreg_RestrictProfilingToAdminUsers=0" in the nvidia.ko kernel
module. This can be set on the command line when loading the
module, or more appropriately via your distribution's kernel module
configuration files (such as those under /etc/modprobe.d/).</p>
</dd>
<dt><a name="RedHat" id="RedHat"></a><span class="term">Driver
fails to initialize with some versions of RHEL 8</span></dt>
<dd>
<p>Some versions of Red Hat Enterprise Linux 8 kernels have a bug
that causes driver initialization to fail with an error such
as:</p>
<pre class="screen">
NVRM: Xid (PCI:0000:09:00): 79, pid=2172, GPU has fallen off the bus.
NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x26:0x65:1239)
NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
</pre>
<p></p>
<p>See the Red Hat knowledge base article <a href=
"https://access.redhat.com/solutions/5825061" target=
"_top">https://access.redhat.com/solutions/5825061</a> to find the
specific affected and fixed kernel versions.</p>
</dd>
<dt><a name="IBT" id="IBT"></a><span class="term">Driver fails to
load on Linux kernel versions 5.18 through 5.18.19 with
CONFIG_X86_KERNEL_IBT enabled</span></dt>
<dd>
<p>The NVIDIA driver fails to load on IBT (Indirect Branch
Tracking) supported CPUs running Linux kernel versions 5.18 to
5.18.19, when IBT is enabled, with the following error:</p>
<pre class="screen">
error "traps: Missing ENDBR:"
</pre>
<p></p>
<p>This issue is not seen with Linux kernels having the following
commit:</p>
<pre class="screen">
commit 3c6f9f77e618 (objtool: Rework ibt and extricate from stack validation)
</pre>
<p>The aforementioned commit is available in Linux kernel versions
5.19 and later. The NVIDIA driver's IBT support works with Linux
kernels containing commit 3c6f9f77e618 (5.19 and later). Please use
the kernel boot parameter "ibt=off" as a workaround on kernels
without that commit.</p>
</dd>
<dt><span class="term">Notebooks</span></dt>
<dd>
<p>If you are using a notebook see the "Known Notebook Issues" in
<a href="configlaptop.html" title=
"Chapter&nbsp;16.&nbsp;Configuring a Notebook">Chapter&nbsp;16,
<i>Configuring a Notebook</i></a>.</p>
</dd>
<dt><a name="texture_clamping" id=
"texture_clamping"></a><span class="term">Texture seams in Quake 3
engine</span></dt>
<dd>
<p>Many games based on the Quake 3 engine set their textures to use
the <code class="computeroutput">GL_CLAMP</code> clamping mode when
they should be using <code class=
"computeroutput">GL_CLAMP_TO_EDGE</code>. This was an oversight
made by the developers because some legacy NVIDIA GPUs treat the
two modes as equivalent. The result is seams at the edges of
textures in these games. To mitigate this, older versions of the
NVIDIA display driver remap <code class=
"computeroutput">GL_CLAMP</code> to <code class=
"computeroutput">GL_CLAMP_TO_EDGE</code> internally to emulate the
behavior of the older GPUs, but this workaround has been disabled
by default. To re-enable it, uncheck the "Use Conformant Texture
Clamping" checkbox in nvidia-settings before starting any affected
applications.</p>
</dd>
<dt><span class="term">FSAA</span></dt>
<dd>
<p>When FSAA is enabled (the __GL_FSAA_MODE environment variable is
set to a value that enables FSAA and a multisample visual is
chosen), the rendering may be corrupted when resizing the
window.</p>
</dd>
<dt><span class="term">libGL DSO finalizer and pthreads</span></dt>
<dd>
<p>When a multithreaded OpenGL application exits, it is possible
for libGL's DSO finalizer (also known as the destructor, or
"_fini") to be called while other threads are executing OpenGL
code. The finalizer needs to free resources allocated by libGL.
This can cause problems for threads that are still using these
resources. Setting the environment variable "__GL_NO_DSO_FINALIZER"
to "1" will work around this problem by forcing libGL's finalizer
to leave its resources in place. These resources will still be
reclaimed by the operating system when the process exits. Note that
the finalizer is also executed as part of dlclose(3), so if you
have an application that dlopens(3) and dlcloses(3) libGL
repeatedly, "__GL_NO_DSO_FINALIZER" will cause libGL to leak
resources until the process exits. Using this option can improve
stability in some multithreaded applications, including Java3D
applications.</p>
</dd>
<dt><span class="term">Thread cancellation</span></dt>
<dd>
<p>Canceling a thread (see pthread_cancel(3)) while it is executing
in the OpenGL driver causes undefined behavior. For applications
that wish to use thread cancellation, it is recommended that
threads disable cancellation using pthread_setcancelstate(3) while
executing OpenGL or GLX commands.</p>
</dd>
</dl>
</div>
<p>This section describes problems that will not be fixed. Usually,
the source of the problem is beyond the control of NVIDIA.
Following is the list of problems:</p>
<div class="variablelist">
<p class="title"><b>Problems that Will Not Be Fixed</b></p>
<dl>
<dt><span class="term">NV-CONTROL versions 1.8 and 1.9</span></dt>
<dd>
<p>Version 1.8 of the NV-CONTROL X Extension introduced target
types for setting and querying attributes as well as receiving
event notification on targets. Targets are objects like X Screens,
GPUs and Quadro Sync devices. Previously, all attributes were
described relative to an X Screen. These new bits of information
(target type and target id) were packed in a non-compatible way in
the protocol stream such that addressing X Screen 1 or higher would
generate an X protocol error when mixing NV-CONTROL client and
server versions.</p>
<p>This packing problem has been fixed in the NV-CONTROL 1.10
protocol, making it possible for the older (1.7 and prior) clients
to communicate with NV-CONTROL 1.10 servers. Furthermore, the
NV-CONTROL 1.10 client library has been updated to accommodate the
target protocol packing bug when communicating with a 1.8 or 1.9
NV-CONTROL server. This means that the NV-CONTROL 1.10 client
library should be able to communicate with any version of the
NV-CONTROL server.</p>
<p>NVIDIA recommends that NV-CONTROL client applications relink
with version 1.10 or later of the NV-CONTROL client library
(libXNVCtrl.a, in the nvidia-settings-535.161.07.tar.bz2 tarball).
The version of the client library can be determined by checking the
NV_CONTROL_MAJOR and NV_CONTROL_MINOR definitions in the
accompanying nv_control.h.</p>
<p>The only web released NVIDIA Linux driver that is affected by
this problem (i.e., the only driver to use either version 1.8 or
1.9 of the NV-CONTROL X extension) is 1.0-8756.</p>
</dd>
<dt><span class="term">CPU throttling reducing memory bandwidth on
IGP systems</span></dt>
<dd>
<p>For some models of CPU, the CPU throttling technology may affect
not only CPU core frequency, but also memory frequency/bandwidth.
On systems using integrated graphics, any reduction in memory
bandwidth will affect the GPU as well as the CPU. This can
negatively affect applications that use significant memory
bandwidth, such as video decoding using VDPAU, or certain OpenGL
operations. This may cause such applications to run with lower
performance than desired.</p>
<p>To work around this problem, NVIDIA recommends configuring your
CPU throttling implementation to avoid reducing memory bandwidth.
This may be as simple as setting a certain minimum frequency for
the CPU.</p>
<p>Depending on your operating system and/or distribution, this may
be as simple as writing to a configuration file in the /sys or
/proc filesystems, or other system configuration file. Please read,
or search the Internet for, documentation regarding CPU throttling
on your operating system.</p>
</dd>
<dt><span class="term">VDPAU initialization failures on supported
GPUs</span></dt>
<dd>
<p>If VDPAU gives the VDP_STATUS_NO_IMPLEMENTATION error message on
a GPU which was labeled or specified as supporting PureVideo or
PureVideo HD, one possible reason is a hardware defect. After
ruling out any other software problems, NVIDIA recommends returning
the GPU to the manufacturer for a replacement.</p>
</dd>
<dt><a name="extension_string_size" id=
"extension_string_size"></a><span class="term">Some applications,
such as Quake 3, crash after querying the OpenGL extension
string</span></dt>
<dd>
<p>Some applications have bugs that are triggered when the
extension string is longer than a certain size. As more features
are added to the driver, the length of this string increases and
can trigger these sorts of bugs.</p>
<p>You can limit the extensions listed in the OpenGL extension
string to the ones that appeared in a particular version of the
driver by setting the <code class=
"computeroutput">__GL_ExtensionStringVersion</code> environment
variable to a particular version number. For example,</p>
<pre class="screen">
__GL_ExtensionStringVersion=17700 quake3
</pre>
<p>will run Quake 3 with the extension string that appeared in the
177.* driver series. Limiting the size of the extension string can
work around this sort of application bug.</p>
</dd>
<dt><a name="gnome_shell_doesnt_update" id=
"gnome_shell_doesnt_update"></a><span class="term">gnome-shell
doesn't update until a window is moved</span></dt>
<dd>
<p>Versions of libcogl prior to 1.10.x have a bug which causes
glBlitFramebuffer() calls used to update the window to be clipped
by a 0x0 scissor (see <a href=
"https://bugzilla.gnome.org/show_bug.cgi?id=690451" target=
"_top">GNOME bug #690451</a> for more details). To work around this
bug, the scissor test can be disabled by setting the <code class=
"computeroutput">__GL_ConformantBlitFramebufferScissor</code>
environment variable to 0. Note this version of the NVIDIA driver
comes with an application profile which automatically disables this
test if libcogl is detected in the process.</p>
</dd>
<dt><a name="Xserver_compares_only_the_matrix_part_of_a_transform"
id=
"Xserver_compares_only_the_matrix_part_of_a_transform"></a><span class="term">Some
X servers ignore the RandR transform filter during a modeset
request</span></dt>
<dd>
<p>The RandR layer of the X server attempts to ignore redundant
RRSetCrtcConfig requests. If the only property changed by an
RRSetCrtcConfig request is the transform filter, some X servers
will ignore the request as redundant. This can be worked around by
also changing other properties, such as the mode, transformation
matrix, etc.</p>
</dd>
</dl>
</div>
<p></p>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href=
"commonproblems.html">Prev</a>&nbsp;</td>
<td width="20%" align="center"><a accesskey="u" href=
"installationandconfiguration.html">Up</a></td>
<td width="40%" align="right">&nbsp;<a accesskey="n" href=
"dma_issues.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">
Chapter&nbsp;8.&nbsp;Common Problems&nbsp;</td>
<td width="20%" align="center"><a accesskey="h" href=
"index.html">Home</a></td>
<td width="40%" align="right" valign="top">
&nbsp;Chapter&nbsp;10.&nbsp;Allocating DMA Buffers on 64-bit
Platforms</td>
</tr>
</table>
</div>
</body>
</html>