[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is engaged/ confused/ bored, we are trying to find what approach to choose: to basically explain about facial landmarks approach this is what my claude says:

Facial landmarks are specific coordinate points (x, y) that map key features on a face. The standard model uses 68 points that outline the jawline, eyebrows, eyes, nose, and mouth. This approach has roots in traditional computer vision and is based on geometric measurements rather than pixel patterns.

Based on this recent paper: [The first look: a biometric analysis of emotion recognition using key facial features](https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1554320/full)

The paper used **eye-tracking on 30 participants** to scientifically determine which facial regions humans actually look at when recognizing emotions:

- **Finding:** People focus primarily on the eyes (especially left eye first) and mouth

- **Innovation:** Reduced the standard 68 landmarks to just **24 critical points** (eyes + mouth)

Another one: Deep Learning (ResNet/CNN)

- ResNet model for facial emotion recognition

- Feed raw facial images → CNN processes → outputs emotion classification.

submitted by /u/Savings_Load2308
[link] [comments]

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

Want to read more?

Tagged with