Hardware List

Results for igt@xe_exec_sip@sanity-after-timeout

Machine description: shard-bmg-1

Result: Abort

i915_display_info21 igt_runner21 results21.json results21-xe-load.json i915_display_info_post_exec21 boot21 dmesg21

Detail	Value
Duration	unknown
Hostname	shard-bmg-1
Igt-Version	IGT-Version: 1.30-g822e95b25 (x86_64) (Linux: 6.14.0-rc3-xe+ x86_64)
Out	Using IGT_SRANDOM=1740150991 for randomisation Opened device: /dev/dri/card0 Starting subtest: sanity-after-timeout Starting dynamic subtest: DRM_XE_ENGINE_CLASS_RENDER0 Dynamic subtest DRM_XE_ENGINE_CLASS_RENDER0: SUCCESS (6.131s) This test caused an abort condition: Lockdep not active /proc/lockdep_stats contents: lock-classes: 2197 [max: 8192] direct dependencies: 24522 [max: 524288] indirect dependencies: 174127 all direct dependencies: 473386 dependency chains: 37013 [max: 524288] dependency chain hlocks used: 163050 [max: 2621440] dependency chain hlocks lost: 0 in-hardirq chains: 317 in-softirq chains: 754 in-process chains: 35942 stack-trace entries: 253576 [max: 524288] number of stack traces: 11996 number of stack hash chains: 8543 combined max dependencies: 39620278 hardirq-safe locks: 98 hardirq-unsafe locks: 1328 softirq-safe locks: 228 softirq-unsafe locks: 1229 irq-safe locks: 240 irq-unsafe locks: 1328 hardirq-read-safe locks: 4 hardirq-read-unsafe locks: 400 softirq-read-safe locks: 9 softirq-read-unsafe locks: 395 irq-read-safe locks: 9 irq-read-unsafe locks: 400 uncategorized locks: 361 unused locks: 1 max locking depth: 21 max bfs queue depth: 375 max lock class index: 2196 debug_locks: 0 zapped classes: 3 zapped lock chains: 150 large chain blocks: 1 Starting dynamic subtest: DRM_XE_ENGINE_CLASS_COMPUTE0 Dynamic subtest DRM_XE_ENGINE_CLASS_COMPUTE0: SUCCESS (6.123s) Subtest sanity-after-timeout: SUCCESS (12.255s)
Err	Starting subtest: sanity-after-timeout Starting dynamic subtest: DRM_XE_ENGINE_CLASS_RENDER0 Dynamic subtest DRM_XE_ENGINE_CLASS_RENDER0: SUCCESS (6.131s) Starting dynamic subtest: DRM_XE_ENGINE_CLASS_COMPUTE0 Dynamic subtest DRM_XE_ENGINE_CLASS_COMPUTE0: SUCCESS (6.123s) Subtest sanity-after-timeout: SUCCESS (12.255s)
Dmesg	<6> [167.052544] Console: switching to colour dummy device 80x25 <6> [167.052797] [IGT] xe_exec_sip: executing <6> [167.060683] [IGT] xe_exec_sip: starting subtest sanity-after-timeout <6> [167.060978] [IGT] xe_exec_sip: starting dynamic subtest DRM_XE_ENGINE_CLASS_RENDER0 <7> [172.508395] xe 0000:03:00.0: [drm:xe_hw_engine_snapshot_capture [xe]] GT0: Proceeding with manual engine snapshot <6> [173.151830] xe 0000:03:00.0: [drm] GT0: Engine reset: engine_class=rcs, logical_mask: 0x1, guc_id=2 <4> [173.152128] <4> [173.152146] ====================================================== <4> [173.152159] WARNING: possible circular locking dependency detected <4> [173.152171] 6.14.0-rc3-xe+ #1 Tainted: G W <4> [173.152184] ------------------------------------------------------ <4> [173.152195] kworker/u64:44/4198 is trying to acquire lock: <4> [173.152206] ffff88813959ab38 (&hwe->pf.lock){+.+.}-{2:2}, at: xe_guc_exec_queue_reset_handler+0xb5/0x200 [xe] <4> [173.152549] but task is already holding lock: <4> [173.152560] ffff888139599428 (&ct->lock){+.+.}-{3:3}, at: receive_g2h+0x47/0x100 [xe] <4> [173.152887] which lock already depends on the new lock. <4> [173.152905] the existing dependency chain (in reverse order) is: <4> [173.152920] -> #2 (&ct->lock){+.+.}-{3:3}: <4> [173.152943] xe_guc_ct_init+0x2b4/0x4c0 [xe] <4> [173.153266] xe_guc_init+0xe9/0x360 [xe] <4> [173.153583] xe_uc_init+0x1e/0x1f0 [xe] <4> [173.153998] xe_gt_init_hwconfig+0x4c/0xb0 [xe] <4> [173.154318] xe_device_probe+0x3ce/0x820 [xe] <4> [173.154628] xe_pci_probe+0x35b/0x5f0 [xe] <4> [173.154768] local_pci_probe+0x44/0xb0 <4> [173.154773] pci_device_probe+0xf4/0x270 <4> [173.154777] really_probe+0xee/0x3c0 <4> [173.154780] __driver_probe_device+0x8c/0x180 <4> [173.154783] driver_probe_device+0x24/0xd0 <4> [173.154786] __driver_attach+0x10f/0x220 <4> [173.154789] bus_for_each_dev+0x8d/0xf0 <4> [173.154792] driver_attach+0x1e/0x30 <4> [173.154794] bus_add_driver+0x151/0x290 <4> [173.154797] driver_register+0x5e/0x130 <4> [173.154800] __pci_register_driver+0x7d/0x90 <4> [173.154803] xe_register_pci_driver+0x23/0x30 [xe] <4> [173.154864] soundcore_open+0x83/0x210 [soundcore] <4> [173.154868] do_one_initcall+0x76/0x400 <4> [173.154871] do_init_module+0x97/0x2a0 <4> [173.154874] load_module+0x2c23/0x2f60 <4> [173.154877] init_module_from_file+0x97/0xe0 <4> [173.154879] idempotent_init_module+0x134/0x350 <4> [173.154882] __x64_sys_finit_module+0x77/0x100 <4> [173.154885] x64_sys_call+0x1f37/0x2650 <4> [173.154887] do_syscall_64+0x91/0x180 <4> [173.154891] entry_SYSCALL_64_after_hwframe+0x76/0x7e <4> [173.154895] -> #1 (fs_reclaim){+.+.}-{0:0}: <4> [173.154899] fs_reclaim_acquire+0xc5/0x100 <4> [173.154903] __kmalloc_cache_noprof+0x58/0x490 <4> [173.154906] pf_queue_work_func+0x53e/0x5d0 [xe] <4> [173.154962] process_one_work+0x21c/0x740 <4> [173.154965] worker_thread+0x1db/0x3c0 <4> [173.154968] kthread+0x10d/0x270 <4> [173.154971] ret_from_fork+0x44/0x70 <4> [173.154975] ret_from_fork_asm+0x1a/0x30 <4> [173.154978] -> #0 (&hwe->pf.lock){+.+.}-{2:2}: <4> [173.154981] __lock_acquire+0x1637/0x2810 <4> [173.154985] lock_acquire+0xc9/0x300 <4> [173.154988] _raw_spin_lock+0x2f/0x60 <4> [173.154991] xe_guc_exec_queue_reset_handler+0xb5/0x200 [xe] <4> [173.155049] dequeue_one_g2h+0x4e5/0x900 [xe] <4> [173.155106] receive_g2h+0x4f/0x100 [xe] <4> [173.155161] g2h_worker_func+0x15/0x20 [xe] <4> [173.155217] process_one_work+0x21c/0x740 <4> [173.155219] worker_thread+0x1db/0x3c0 <4> [173.155222] kthread+0x10d/0x270 <4> [173.155224] ret_from_fork+0x44/0x70 <4> [173.155227] ret_from_fork_asm+0x1a/0x30 <4> [173.155230] other info that might help us debug this: <4> [173.155233] Chain exists of: &hwe->pf.lock --> fs_reclaim --> &ct->lock <4> [173.155238] Possible unsafe locking scenario: <4> [173.155241] CPU0 CPU1 <4> [173.155243] ---- ---- <4> [173.155245] lock(&ct->lock); <4> [173.155247] lock(fs_reclaim); <4> [173.155250] lock(&ct->lock); <4> [173.155253] lock(&hwe->pf.lock); <4> [173.155255] * DEADLOCK * <4> [173.155258] 3 locks held by kworker/u64:44/4198: <4> [173.155260] #0: ffff88812ba44d48 ((wq_completion)xe-g2h-wq){+.+.}-{0:0}, at: process_one_work+0x444/0x740 <4> [173.155266] #1: ffffc9000af1fe20 ((work_completion)(&ct->g2h_worker)){+.+.}-{0:0}, at: process_one_work+0x1da/0x740 <4> [173.155273] #2: ffff888139599428 (&ct->lock){+.+.}-{3:3}, at: receive_g2h+0x47/0x100 [xe] <4> [173.155331] stack backtrace: <4> [173.155334] CPU: 1 UID: 0 PID: 4198 Comm: kworker/u64:44 Tainted: G W 6.14.0-rc3-xe+ #1 <4> [173.155335] Tainted: [W]=WARN <4> [173.155336] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 1645 03/15/2024 <4> [173.155337] Workqueue: xe-g2h-wq g2h_worker_func [xe] <4> [173.155391] Call Trace: <4> [173.155392] <TASK> <4> [173.155393] dump_stack_lvl+0x91/0xf0 <4> [173.155395] dump_stack+0x10/0x20 <4> [173.155397] print_circular_bug+0x285/0x360 <4> [173.155399] check_noncircular+0x150/0x170 <4> [173.155400] ? __irq_work_queue_local+0x43/0x120 <4> [173.155403] __lock_acquire+0x1637/0x2810 <4> [173.155405] ? dev_printk_emit+0xa1/0xe0 <4> [173.155407] lock_acquire+0xc9/0x300 <4> [173.155409] ? xe_guc_exec_queue_reset_handler+0xb5/0x200 [xe] <4> [173.155464] ? __dev_printk+0x39/0xa0 <4> [173.155465] ? _dev_info+0x75/0xa0 <4> [173.155467] _raw_spin_lock+0x2f/0x60 <4> [173.155468] ? xe_guc_exec_queue_reset_handler+0xb5/0x200 [xe] <4> [173.155523] xe_guc_exec_queue_reset_handler+0xb5/0x200 [xe] <4> [173.155577] dequeue_one_g2h+0x4e5/0x900 [xe] <4> [173.155630] ? receive_g2h+0x47/0x100 [xe] <4> [173.155684] receive_g2h+0x4f/0x100 [xe] <4> [173.155738] g2h_worker_func+0x15/0x20 [xe] <4> [173.155791] process_one_work+0x21c/0x740 <4> [173.155793] worker_thread+0x1db/0x3c0 <4> [173.155795] ? __pfx_worker_thread+0x10/0x10 <4> [173.155796] kthread+0x10d/0x270 <4> [173.155797] ? __pfx_kthread+0x10/0x10 <4> [173.155798] ret_from_fork+0x44/0x70 <4> [173.155799] ? __pfx_kthread+0x10/0x10 <4> [173.155800] ret_from_fork_asm+0x1a/0x30 <4> [173.155803] </TASK> <7> [173.156089] xe 0000:03:00.0: [drm:guc_exec_queue_timedout_job [xe]] GT0: Check job timeout: seqno=4294967169, lrc_seqno=4294967169, guc_id=2, running_time_ms=0, timeout_ms=5000, diff=0x00000000 <5> [173.156164] xe 0000:03:00.0: [drm] GT0: Timedout job: seqno=4294967169, lrc_seqno=4294967169, guc_id=2, flags=0x0 in xe_exec_sip [4546] <6> [173.179059] xe 0000:03:00.0: [drm] Xe device coredump has been created <6> [173.179062] xe 0000:03:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data <6> [173.192268] [IGT] xe_exec_sip: finished subtest DRM_XE_ENGINE_CLASS_RENDER0, SUCCESS <6> [173.192522] [IGT] xe_exec_sip: starting dynamic subtest DRM_XE_ENGINE_CLASS_COMPUTE0 <6> [179.293652] xe 0000:03:00.0: [drm] GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=2 <7> [179.293850] xe 0000:03:00.0: [drm:guc_exec_queue_timedout_job [xe]] GT0: Check job timeout: seqno=4294967169, lrc_seqno=4294967169, guc_id=2, running_time_ms=0, timeout_ms=5000, diff=0x00000000 <5> [179.294260] xe 0000:03:00.0: [drm] GT0: Timedout job: seqno=4294967169, lrc_seqno=4294967169, guc_id=2, flags=0x0 in xe_exec_sip [4546] <7> [179.294288] xe 0000:03:00.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken <6> [179.315945] [IGT] xe_exec_sip: finished subtest DRM_XE_ENGINE_CLASS_COMPUTE0, SUCCESS <6> [179.316191] [IGT] xe_exec_sip: finished subtest sanity-after-timeout, SUCCESS <7> [179.323293] xe 0000:03:00.0: [drm:drm_client_dev_restore] intel-fbdev: ret=0 <6> [179.323559] [IGT] xe_exec_sip: exiting, ret=0 <6> [179.340193] Console: switching to colour frame buffer device 240x67

Created at 2025-02-21 15:34:59