How to serialize/deserialize gradient boosting trees

How to serialize/deserialize gradient boosting trees

Hello,

I am trying to train my gradient boosting trees regression model, then serialize/deserialize the result and after use prediction with it. 

When I don't serialize/deserialize between training and prediction it works. But unfortunately when I (de)serialize it I got the error :

unknown file: error: C++ exception with description "Number of columns in numeric table is incorrect
Details:
Argument name: data
" thrown in the test body

I wrote this, can you help me to find where is the problem please? :

 const NumericTablePtr featureSamplesTable(new HomogenNumericTable<double>(trainingFeatureSamples, nTrainingFeatures, nTrainingSamples));
  const NumericTablePtr targetValuesTable(new HomogenNumericTable<double>(trainingTargetValues, 1, nTrainingSamples));

  algorithms::gbt::regression::training::Batch<> algorithm;

  algorithm.input.set(algorithms::gbt::regression::training::data, featureSamplesTable);
  algorithm.input.set(algorithms::gbt::regression::training::dependentVariable, targetValuesTable);

  // Gradient Boosted Trees model config
  algorithm.parameter().maxIterations = 66;

  // Train Gradient Boosted Trees model
  const bool status = algorithm.computeNoThrow().ok();

  if (status)
  {
    const SharedPtr<algorithms::gbt::regression::training::Result> result = algorithm.getResult();

    // Serialize result 
    InputDataArchive dataArch;
    result->serialize(dataArch);
    auto length = dataArch.getSizeOfArchive();
    auto buffer = new byte[length];
    dataArch.copyArchiveToArray(buffer, length);

    // Deserialize and Evaluate model

    const NumericTablePtr testFeatureSamplesTable(new HomogenNumericTable<double>(testFeatureSamples, nTestFeatures, nTestSamples));

    algorithms::gbt::regression::prediction::Batch<> algorithmEval;

    algorithmEval.input.set(algorithms::gbt::regression::prediction::data, testFeatureSamplesTable);

    // Deserialize result
    SharedPtr<algorithms::gbt::regression::training::Result> trainingResult(new algorithms::gbt::regression::training::Result());
    OutputDataArchive dataArch2(buffer, length);
    trainingResult->deserialize(dataArch2);
    auto trainingModel = trainingResult->get(algorithms::gbt::regression::training::model);

    algorithmEval.input.set(algorithms::gbt::regression::prediction::model, trainingModel);

    // Predict values of gradient boosted trees regression
    algorithmEval.compute();
    const bool statusEval = algorithmEval.computeNoThrow().ok();

    // Retrieve the algorithm results
    if (statusEval)
    {
      const SharedPtr<algorithms::gbt::regression::prediction::Result> resultEval = algorithmEval.getResult();
      NumericTablePtr predictionResult = resultEval->get(algorithms::gbt::regression::prediction::prediction);

 

Thanks a lot,

Sandra

 

 

 

 

 

 

publicaciones de 10 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hi Sandra,

What version of DAAL do you use? We tested your code on DAAL 2019 Update 2 release and it works correctly in both cases: with serialization and without.

One of possible reason of this exception may be incorrect number of features in test and train data sets, I mean trainingFeatureSamples and testFeatureSamples variables. They must be the same in a data set. What dimensions do you use?

Also, sending all your code should be helpful (if it is possible, of course).

Thanks for your response. Actually I am using version 2018 Update 3.

Here are my simple test case and the corresponding code :

TEST(DaalTests, GradientBoostedTreesTest)
{
  int nRows = 1000;
  int nCols = 1;
  double * features = new double[nRows * nCols];
  double * labels = new double[nRows];
  for (int i = 0; i < nRows; ++i)
  {
    for (int j = 0; j < nCols; ++j)
    {
      features[i * nCols + j] = i + 1.0;
    }

    if (i < 500)
    {
      labels[i] = 33.0;
    }
    else
    {
      labels[i] = 55.0;
    }
    
  }

  double * predictedValues = new double[nRows];
  int status = DataAnaltyicslFunction::test(features, nCols, labels, nRows, predictedValues, features, nCols, nRows);

  ASSERT_EQ(0, status);

  ASSERT_NEAR(33.0, predictedValues[100], 1e-5);
  ASSERT_NEAR(55.0, predictedValues[600], 1e-5);

  delete[] features;
  delete[] labels;
  delete[] predictedValues;
}

 

int DataAnaltyicslFunction::test(double* const trainingFeatureSamples, int nTrainingFeatures, double* const trainingTargetValues, int nTrainingSamples, double* predictedValues, double* const testFeatureSamples, int nTestFeatures, int nTestSamples)
{
  const NumericTablePtr featureSamplesTable(new HomogenNumericTable<double>(trainingFeatureSamples, nTrainingFeatures, nTrainingSamples));
  const NumericTablePtr targetValuesTable(new HomogenNumericTable<double>(trainingTargetValues, 1, nTrainingSamples));

  algorithms::gbt::regression::training::Batch<> algorithm;

  algorithm.input.set(algorithms::gbt::regression::training::data, featureSamplesTable);
  algorithm.input.set(algorithms::gbt::regression::training::dependentVariable, targetValuesTable);

  // Gradient Boosted Trees model config
  algorithm.parameter().maxIterations = 66;

  // Train Gradient Boosted Trees model
  const bool status = algorithm.computeNoThrow().ok();

  if (status)
  {
    const SharedPtr<algorithms::gbt::regression::training::Result> result = algorithm.getResult();

    // Serialize result 
    InputDataArchive dataArch;
    result->serialize(dataArch);
    auto length = dataArch.getSizeOfArchive();
    auto buffer = new byte[length];
    dataArch.copyArchiveToArray(buffer, length);

    // Deserialize and Evaluate model

    const NumericTablePtr testFeatureSamplesTable(new HomogenNumericTable<double>(testFeatureSamples, nTestFeatures, nTestSamples));

    algorithms::gbt::regression::prediction::Batch<> algorithmEval;

    algorithmEval.input.set(algorithms::gbt::regression::prediction::data, testFeatureSamplesTable);

    // Deserialize result
    SharedPtr<algorithms::gbt::regression::training::Result> trainingResult(new algorithms::gbt::regression::training::Result());
    OutputDataArchive dataArch2(buffer, length);
    trainingResult->deserialize(dataArch2);
    auto trainingModel = trainingResult->get(algorithms::gbt::regression::training::model);

    algorithmEval.input.set(algorithms::gbt::regression::prediction::model, trainingModel);

    // Predict values of gradient boosted trees regression
    algorithmEval.compute();
    const bool statusEval = algorithmEval.computeNoThrow().ok();

    // Retrieve the algorithm results
    if (statusEval)
    {
      const SharedPtr<algorithms::gbt::regression::prediction::Result> resultEval = algorithmEval.getResult();
      NumericTablePtr predictionResult = resultEval->get(algorithms::gbt::regression::prediction::prediction);

      BlockDescriptor<double> block;
      predictionResult->getBlockOfRows(0, predictionResult->getNumberOfRows(), readOnly, block);
      memcpy(predictedValues, block.getBlockPtr(), block.getNumberOfRows() * sizeof(double));
    }
    return statusEval ? 0 : 1;
  }
  return status ? 0 : 1;
}

Thanks a lot for your help,

Sandra

 

 

Hi Sandra,

We reproduced the issue for DAAL 2018 Update 3 version on your code. But it has been fixed in latest versions of DAAL.

In your case I offer to you download and use already built DAAL 2019 Update 1.1 version from GitHub: https://github.com/intel/daal/releases/tag/2019_u1.1

This version solves your issues and additionally contains special performance optimization for training stage of Gradient Boosted Trees, I think it should be useful for you.

DAAL 2019 Update 2 fixes your problem, but doesn’t have these optimizations. If it is okay for you – you can get from official site or github.

Also, DAAL 2019 Update 3 is coming, it will contain the optimizations and will be available in all our distribution channels.

 

Thanks,
Alexey

Hi Alexey,

Thanks a lot for your response. It is very helpful.

Ok we will use update 1.1 while waiting for update 3.

 

And do you have an idea when update 3 will come, please ?

Thanks again

Intel DAAL Update 3 will come in the end  of Q2. We will inform you about it in this thread.

Let us know if you have additional questions.

 

Best regards,
Alexey

Hi Sandra,

Intel DAAL 2019 Update 3 version was released on GitHub

Let us know if you need help with Intel DAAL.

Thanks a lot!

Hi Sandra,

Intel DAAL v.2019 u3 is not available for available and ready for download. The fix of this issue available into this update. Could you check and let us know the results! 

thanks, Gennady

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya