Method

We introduce a novel distillation procedure to transfer the distinctive properties of a slow teacher encoder to a faster student encoder. We define our proposed distillation approach as object-oriented because training supervision is provided at the object level.
First, we extract teacher descriptors $\mathcal{F}^Q=\Phi_\Theta(\mathcal{P}^Q), \mathcal{F}^T=\Phi_\Theta(\mathcal{P}^T)$ using GeDi encoder $\Phi_\Theta$. Then, we learn student descriptors $\mathcal{G}^Q=\Psi_\Omega(\mathcal{P}^Q), \mathcal{G}^T=\Psi_\Omega(\mathcal{P}^T)$ with a PTV3 encoder $\Psi_\Omega$. During distillation, we optimize the parameters $\Omega$ so that $\mathcal{G}^Q \approx \mathcal{F}^Q$ and $\mathcal{G}^T \approx \mathcal{F}^T$, while $\Theta$ remain frozen. Note that this differs from online knowledge distillation approaches, where both the teacher and student are neural networks and learn $\Theta, \Omega$ simultaneously.
In our case, the architectures of $\Phi_\Theta$ and $\Psi_\Omega$ are distinct. Teacher features are precomputed, stored in memory, and loaded as needed during distillation. To optimize this process, we propose storing teacher features only for query objects and introduce a module that leverages ground-truth 6D poses to transfer them to target objects. Moreover, we propose a custom loss function that focuses learning on noise-free points, leading to improved performance.