Posts

Showing posts from December, 2023

My Research papar Published in IEEE

Image
 Hi All, Just wanted to share my research paper that was published back in Oct 2021. This was about the Web based app that calculates Nitrogen concentration of Rice leaf using Leaf Color Chart. The approach was to calculate Average RGB and get the HEX Color code for all shades in LCC(:Leaf Color Chart). Leaf color chart is a tool used to get the nitrogen conc. The web/android app that i made was to digitize the LCC. You can read more on LCC here:  https://iiss.icar.gov.in/eMagazine/v3i1/9.pdf Here is the paper in IEEE Explore website. https://ieeexplore.ieee.org/document/9555875/ Thanks

Generate PySpark Schema dynamically in Python from JSON Sample

 Hi Folks, If you need to genarate pyspark schema from JSON you can always my tool here  https://preetranjan.github.io/pyspark-schema-generator/ but if you need to do it in Python then here is the code snippet for it. It takes a python dictionary as input and generates the PySpark schema. import json from pyspark.sql.types import * def GeneratePySparkSchema ( json ):     fields = []     for key , value in json.items ():         if isinstance ( value , dict ):             field = StructField ( key , GeneratePySparkSchema ( value ), True )         elif isinstance ( value , list ):             if len ( value ) == 0 :                 field = StructField ( key , ArrayType ( StringType ()), True )             elif isinstance ( value [ 0 ], dict ):                 field = StructField (                     key , ArrayType ( GeneratePySparkSchema ( value [ 0 ]), True )                 )             else :                 field = StructField ( key , ArrayType ( GetSparkData

firstworkdate Qlik Equivalent in Spark SQL

Image
 Hi There! I was chatting with a friend and he was facing a problem on a migration project. The old script and processes was built with Qlik, I have never heard of it until now. There the script was using a called a function called as  firstworkdate  The firstworkdate function returns the latest starting date to achieve no_of_workdays (Monday-Friday) ending no later than end_date taking into account any optionally listed holidays. end_date and holiday should be valid dates or timestamps. Here I have excluded the holiday part though. Please suggest if you have anything in mind to implement it. I still think its a very lame solution though but it works. 😀 Here is a proposed solution: ​ select col2 as endDate , reverse ( slice ( reverse ( filter ( transform ( sequence ( date_sub ( col2 , col1 * 2 ) , col2 ) , x - > struct ( x , weekday ( x ) ) ) , x - > x . col2 not in ( 5 , 6 )