<html><head><meta name="color-scheme" content="light dark"></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">
tl;dr: AMX state is ~8k.  Signal frames can have space for this
~8k and each signal entry writes out all 8k even if it is zeros.
Skip writing zeros for AMX to speed up signal entry.  This is a
user-visible change to the sigframe ABI.

== Hardware XSAVE Background ==

XSAVE state components may be tracked by the processor as being
in their initial configuration.  Software can detect which
features are in this configuration by looking at the XSTATE_BV
field in an XSAVE buffer or with the XGETBV(1) instruction.

Both the XSAVE and XSAVEOPT instructions enumerate features s
being in the initial configuration via the XSTATE_BV field in the
XSAVE header,  However, XSAVEOPT declines to actually write
features in their initial configuration to the buffer.  XSAVE
writes the feature unconditionally, regardless of whether it is
in the initial configuration or not.

Basically, XSAVE users never need to inspect XSTATE_BV to
determine if the feature has been written to the buffer.
XSAVEOPT users *do* need to inspect XSTATE_BV.  They might also
need to clear out the buffer if they want to make an isolated
change to the state, like modifying one register.

== Software Signal / XSAVE Background ==

Signal frames have historically been written with XSAVE itself.
Each state is written in its entirety, regardless of being in its
initial configuration.

In other words, the signal frame ABI uses the XSAVE behavior, not
the XSAVEOPT behavior.

== Problem ==

This means that any application which has acquired permission to
use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the
signal frame.  This 8k write will occur even when AMX was in its
initial configuration and software *knows* this because of
XSTATE_BV.

This problem also exists to a lesser degree with AVX-512 and its
2k of state.  However, AVX-512 use does not require
ARCH_REQ_XCOMP_PERM and is more likely to have existing users
which would be impacted by any change in behavior.

== Solution ==

Stop writing out AMX xfeatures which are in their initial state
to the signal frame.  This effectively makes the signal frame
XSAVE buffer look as if it were written with a combination of
XSAVEOPT and XSAVE behavior.  Userspace which handles XSAVEOPT-
style buffers should be able to handle this naturally.

For now, just include the AMX xfeatures: XTILE and XTILEDATA.
These require new ABI to use anyway, which makes their users very
unlikely to be broken.  This XSAVEOPT-like behavior should be
expected for all future dynamic xfeatures.  It may also be
extended to legacy features like AVX-512 in the future.



---

 b/arch/x86/include/asm/fpu/xcr.h    |   10 +++++++
 b/arch/x86/include/asm/fpu/xstate.h |    8 +++++
 b/arch/x86/kernel/fpu/xstate.h      |   51 ++++++++++++++++++++++++++++++++++--
 3 files changed, 67 insertions(+), 2 deletions(-)

diff -puN arch/x86/kernel/fpu/xstate.h~avoid-writing-amx-zeros arch/x86/kernel/fpu/xstate.h
--- a/arch/x86/kernel/fpu/xstate.h~avoid-writing-amx-zeros	2021-11-01 09:31:32.929770480 -0700
+++ b/arch/x86/kernel/fpu/xstate.h	2021-11-01 14:03:31.229084834 -0700
@@ -4,6 +4,7 @@
 
 #include &lt;asm/cpufeature.h&gt;
 #include &lt;asm/fpu/xstate.h&gt;
+#include &lt;asm/fpu/xcr.h&gt;
 
 #ifdef CONFIG_X86_64
 DECLARE_PER_CPU(u64, xfd_state);
@@ -199,6 +200,47 @@ static inline void os_xrstor_supervisor(
 }
 
 /*
+ * XSAVE itself always writes all xfeatures.  Calculate a mask
+ * of features to skip some writes to reduce the XSAVE cost.
+ *
+ * This optimization is user-visible.  Only use for states which
+ * have users where uninitialized sigframe state is tolerable.
+ * Users must check XSTATE_BV to determine which features have
+ * been optimized out.
+ *
+ * Returns a mask of user features which may be optimized or
+ * are disabled.  This result needs to be masked against a list
+ * of enabled features again in order to do anything meaningful.
+ */
+static inline u64 xfeatures_sigframe_initopt(void)
+{
+	u64 xf_in_use = xfeatures_in_use();
+	u64 xf_maybe_opt;
+
+	/*
+	 * xf_in_use now has set of XCR0-enabled xfeatures
+	 * which are not in the init state.
+	 */
+
+	/* Every sigframe state should be a supported user state: */
+	BUILD_BUG_ON(XFEATURE_MASK_SIGFRAME_INITOPT &amp;
+		     ~(XFEATURE_MASK_USER_SUPPORTED |
+		       XFEATURE_MASK_USER_DYNAMIC));
+
+	/*
+	 * Invert xf_in_use so bits are set for any features
+	 * which is disabled or *not* in the init state.
+	 */
+	xf_maybe_opt = ~xf_in_use;
+
+	/*
+	 * Include only features for which the sigrframe
+	 * optimization is valie.  May include disabled states.
+	 */
+	return xf_maybe_opt &amp; XFEATURE_MASK_SIGFRAME_INITOPT;
+}
+
+/*
  * Save xstate to user space xsave area.
  *
  * We don't use modified optimization because xrstor/xrstors might track
@@ -220,10 +262,15 @@ static inline int xsave_to_user_sigframe
 	 */
 	struct fpstate *fpstate = current-&gt;thread.fpu.fpstate;
 	u64 mask = fpstate-&gt;user_xfeatures;
-	u32 lmask = mask;
-	u32 hmask = mask &gt;&gt; 32;
+	u32 lmask;
+	u32 hmask;
 	int err;
 
+	/* Optimize out the XSAVE cost for a subset of xfeatures: */
+	mask &amp;= ~xfeatures_sigframe_initopt();
+
+	lmask = mask;
+	hmask = mask &gt;&gt; 32;
 	xfd_validate_state(fpstate, mask, false);
 
 	stac();
diff -puN arch/x86/include/asm/fpu/xstate.h~avoid-writing-amx-zeros arch/x86/include/asm/fpu/xstate.h
--- a/arch/x86/include/asm/fpu/xstate.h~avoid-writing-amx-zeros	2021-11-01 09:44:58.093736650 -0700
+++ b/arch/x86/include/asm/fpu/xstate.h	2021-11-01 13:56:39.269102143 -0700
@@ -92,6 +92,14 @@
 #define XFEATURE_MASK_FPSTATE	(XFEATURE_MASK_USER_RESTORE | \
 				 XFEATURE_MASK_SUPERVISOR_SUPPORTED)
 
+/*
+ * Features in this mask have space allocated in the signal frame, but may not
+ * have that space initialized when the feature is in its init state.
+ *
+ */
+#define XFEATURE_MASK_SIGFRAME_INITOPT	(XFEATURE_MASK_XTILE | \
+					 XFEATURE_MASK_USER_DYNAMIC)
+
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 
 extern void __init update_regset_xstate_info(unsigned int size,
diff -puN arch/x86/include/asm/fpu/xcr.h~avoid-writing-amx-zeros arch/x86/include/asm/fpu/xcr.h
--- a/arch/x86/include/asm/fpu/xcr.h~avoid-writing-amx-zeros	2021-11-01 13:56:45.205101894 -0700
+++ b/arch/x86/include/asm/fpu/xcr.h	2021-11-01 13:59:26.973095097 -0700
@@ -3,6 +3,7 @@
 #define _ASM_X86_FPU_XCR_H
 
 #define XCR_XFEATURE_ENABLED_MASK	0x00000000
+#define XCR_XFEATURE_IN_USE_MASK	0x00000001
 
 static inline u64 xgetbv(u32 index)
 {
@@ -20,4 +21,13 @@ static inline void xsetbv(u32 index, u64
 	asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index));
 }
 
+/*
+ * Return a mask of xfeatures which are currently being tracked
+ * by the processor as being in the initial configuration.
+ */
+static inline u64 xfeatures_in_use(void)
+{
+	return xgetbv(XCR_XFEATURE_IN_USE_MASK);
+}
+
 #endif /* _ASM_X86_FPU_XCR_H */
_
</pre></body></html>