Optimising Facial Recognition Accuracy

Optimising Facial Recognition Accuracy

We're wanting to see how well we can push the accuracy of the algorithm. With this in mind there's a few questions I'd love to find answers for:

  • When a model is created, what data does it contain? (I notice it's 24kB when serialised) Is it based on a single frame, or many?
  • Does FR make use of video hardware acceleration?
  • Are there any camera characteristics that will produce better results? Higher resolution = higher accuracy? Does a 3D camera (Senz) make any difference in recognition?
  • What's the trade-off in 6-point vs 7-point? Whilst this is configurable when Landmarking, does make any difference to the Recognition?
  • Are there any practical tips to maximize detection accuracy? (eg. multi-frame voting or multi-model matching)
  • What practically is the expected real-world accuracy of the algorithm?

I appreciate your work.

5 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Here is the answer.

When a model is created, what data does it contain? (I notice it's 24kB when serialised) Is it based on a single frame, or many?

We cannot disclose the model data. But it is a feature extractor data that is unique to that face. It is based on a single face. But you can collect multiple face profiles and store in an array. Then use Compare(Model **models, pxcU32 nmodels, pxcF32 *scores, pxcU32 *index) to compare the current face with the array of models.

Does FR make use of video hardware acceleration?

No.

Are there any camera characteristics that will produce better results? Higher resolution = higher accuracy? Does a 3D camera (Senz) make any difference in recognition?

The recognition is not using any of the depth data. So 3D will not make any difference. Higher resolution with detailed face resolution can give better results. More importantly, creating face models representative of all view angles of your face can also get better results.

What's the trade-off in 6-point vs 7-point? Whilst this is configurable when Landmarking, does make any difference to the Recognition?

6/7 points does not make any difference in face recognition. However in landmark detection, the 6pts is a 2D model while the 7pts is a 3D model. So the two are different. 

Are there any practical tips to maximize detection accuracy? (eg. multi-frame voting or multi-model matching)

Depending on the usage scenario or usage model that you are targeting, these might vary. There is no face detection algorithm can satisfy all usage models.

What practically is the expected real-world accuracy of the algorithm?

What do you mean "real-world accuracy of the algorithm"? 

 

Thanks

Thanks David,

We're investigating the possibility of using the technology in a "Unique Person Counter" in an event venue. The aim of the investigation is to see if the technology is yet mature enough for these kinds of scenarios. 

Do you feel the technology is mature enough to see 90%+ accuracy in this context?

Early tests are showing positive results if we:

  • Instead of identifying a face based on a single model, capture a number of models over time and use them to "vote" for an identity
  • If no clear match is found, create a new identity based on the best model that had been captured.

Based on your feedback, I'll test maintaining a number of models for each identity - perhaps using the Location feedback data to determine the posture, and capturing the best for each.

 

One more question that is on my mind: In context's where the CPU becomes constrained, does the algorithm decrease in accuracy, or just speed?

I think your idea is pretty good and it should increase the recognition rate. If the CPU's context becomes constrained, the recognition accuracy will not change. only speed.

Thanks David,

In small lab tests it seems that using multiple models per person actually decreases accuracy (increases false positives). As an alternative, I'm considering positioning the camera in such a way that we will see similar perspectives on people's faces (in a doorway, for example, where people will generally be facing in a consistent direction). This appears to increase the detection accuracy. 

In terms of learning however there is still the challenge of selecting the best model to use. I've had reasonable success by collecting a number of models for each face (say, 50 models), and then comparing each model to each other. The model that has the best comparison to the other models is then selected for learning.

Would you be able to advise if there is a better approach to this?

 

I've noticed as well that the most expensive operation is createModel(..). Is there any ways to optimize the speed of this operation? Does it intrinsically use multiple threads? Or is it best to perform multiple createModel(..) in parallel on different threads?

 

 

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi