Posts

Extract ZIP files in Azure Data Lake Storage using Python

 Hi There, This example shows how to extract a zip file in Azure ADLS using Python Recently I need to extract a Zip file into an ADLS storage from a Python script. Its the same approach if you are trying to do it from a DJango or Fast API service. The approach is to read the Zip files in memory in  sequence and upload them to ADLS. Its unlike what we do in databricks where extract can be done through mount points but extracting a zip from a Python script or Service is a bit different. Its very simple and short. To run this I have created a ADLS gen2 Storage and a container named samples. """ Author: PREETish Reach me at: https://www.pritishranjan.com Queries: https://preetblogs.azurewebsites.net/aboutme Github: PreetRanjan """ import zipfile import io from azure . storage . filedatalake import FileSystemClient from datetime import datetime connection_string = "<your_connection_string>" file_system_client = FileSystemClient . fro

My Research papar Published in IEEE

Image
 Hi All, Just wanted to share my research paper that was published back in Oct 2021. This was about the Web based app that calculates Nitrogen concentration of Rice leaf using Leaf Color Chart. The approach was to calculate Average RGB and get the HEX Color code for all shades in LCC(:Leaf Color Chart). Leaf color chart is a tool used to get the nitrogen conc. The web/android app that i made was to digitize the LCC. You can read more on LCC here:  https://iiss.icar.gov.in/eMagazine/v3i1/9.pdf Here is the paper in IEEE Explore website. https://ieeexplore.ieee.org/document/9555875/ Thanks

Generate PySpark Schema dynamically in Python from JSON Sample

 Hi Folks, If you need to genarate pyspark schema from JSON you can always my tool here  https://preetranjan.github.io/pyspark-schema-generator/ but if you need to do it in Python then here is the code snippet for it. It takes a python dictionary as input and generates the PySpark schema. import json from pyspark.sql.types import * def GeneratePySparkSchema ( json ):     fields = []     for key , value in json.items ():         if isinstance ( value , dict ):             field = StructField ( key , GeneratePySparkSchema ( value ), True )         elif isinstance ( value , list ):             if len ( value ) == 0 :                 field = StructField ( key , ArrayType ( StringType ()), True )             elif isinstance ( value [ 0 ], dict ):                 field = StructField (                     key , ArrayType ( GeneratePySparkSchema ( value [ 0 ]), True )                 )             else :                 field = StructField ( key , ArrayType ( GetSparkData

firstworkdate Qlik Equivalent in Spark SQL

Image
 Hi There! I was chatting with a friend and he was facing a problem on a migration project. The old script and processes was built with Qlik, I have never heard of it until now. There the script was using a called a function called as  firstworkdate  The firstworkdate function returns the latest starting date to achieve no_of_workdays (Monday-Friday) ending no later than end_date taking into account any optionally listed holidays. end_date and holiday should be valid dates or timestamps. Here I have excluded the holiday part though. Please suggest if you have anything in mind to implement it. I still think its a very lame solution though but it works. 😀 Here is a proposed solution: ​ select col2 as endDate , reverse ( slice ( reverse ( filter ( transform ( sequence ( date_sub ( col2 , col1 * 2 ) , col2 ) , x - > struct ( x , weekday ( x ) ) ) , x - > x . col2 not in ( 5 , 6 )

Building a Login Flow with .NET MAUI

Image
​ Let's build a Login Flow with .NET MAUI with Shell. Authentication in any mobile app is very common. Lets get started with this. Its obvious that it should ask for login only if it isn't authenticated. We will check for authentication , if not there we will move to Login page if login is success we will move to the Home page. For this example we will override the backbutton pressed event to quit the application but you can customize accordingly as per your need. For this post I am using a simple authentication but you can use JWT or any method you want.  Here is the example of the login flow:       All the pages that has to be used needs  to be registered with the Shell. If you are a bit familier with the Shell navigation the first content page is the one which is displayed after startup. So we need to structure the shell accordingly in order. The pages we are using here for the example: LoadingPage LoginPage HomePage SettingsPage Here is the AppShell.xaml <

PySpark Schema Generator - A simple tool to generate PySpark schema from JSON data

Image
 Hi Folks, I built a small tool that solves a problem for a data engineer while dealing with JSON data. As we know JSON data is semi-structured and we always ingest them and denormalize them to smaller tables properly for further processing. In my case I had to generate PySpark Schema from JSON to ingest the data and the JSON structure often gets changed. The JSON I was dealing was very complex but let me give you an example about the tool, what problem it solves. For example we have a JSON coming from Kafka like below {   "name": "PREETish ranjan",   "dob": "2022-03-04T18:30:00.000Z",   "status": "active",   "isActive": true,   "id": 102,   "address": {     "city": "Bhubaneswar",     "PIN": 500016   },   "mobiles": ["8989898989", "5656565656"],   "id_cards": [1, 2, 3, 4, 5] } The output i need is like this, StructType([     Str

Query Builder using Angular

Image
 Hi All, Initial Update: 09/08/2022 I am building a kind of SQL Query builder with nested statements and conditions using Angular and Typescript. This is currently in development and completed yet. I have used control value accessor interface available in angular with nested child components. The main idea is not mine at all also I have referred a blog to build this but I have changed as per my requirement, there is quite a few providers which offer angular components for these. But i decided to make it my own. Here are my initial screenshots. Screenshot 1: Screenshot 2 JSON Structure for this: I don't have the blog reference now, i missed it. I will try to include it in the next update if I find it. The project is live. Please visit here:  Query Builder   I will add more features into it. Thanks for reading.