Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
/**
* This is an UDF class supporting string concatenation for spark SQL
*/
public class StringConcatonator {

   /**
    This UDF returns the concatenation of two strings
    */
   public String concat(String firstString, String secondString) {
       return firstString + secondString;
   }
}
info
Info
  • Apache Spark does not support primitive data type returns. Therefore, all the methods in a POJO class should return the wrapper class of the corresponding primitive data type.

    e.g., A method to add two integers should be defined as shown below.

    Code Block
    languagesql
    public Integer AddNumbers(Integer a)
    {
    }
  •  

  • Method overloading for UDFs is not supported. Different UDFs should have different method names for the expected behaviour.
  • If the user consumes a data type that is not supported for Apache Spark, the following error appears when you start the DAS server.

    Code Block
    languagetext
    Error initializing analytics executor: Cannot determine the return DataType

     For a list of return types supported for Apache Spark, see Spark SQL and DataFrames and Datasets Guide - Data Types.

    If you need to use one or more methods that are not UDF methods, and they contain return types that are not supported for Apache Spark, you can use a separate class to define them. This class does not have to be added to the <DAS_HOME>/repository/conf/analytics/spark/spark-udf-config.xml file.

 

Step 2: Package the class in a jar

...