Extract ZIP files in Azure Data Lake Storage using Python
Hi There,
This example shows how to extract a zip file in Azure ADLS using Python
Recently I need to extract a Zip file into an ADLS storage from a Python script. Its the same approach if you are trying to do it from a DJango or Fast API service.
The approach is to read the Zip files in memory in sequence and upload them to ADLS.
Its unlike what we do in databricks where extract can be done through mount points but extracting a zip from a Python script or Service is a bit different.
Its very simple and short.
To run this I have created a ADLS gen2 Storage and a container named samples.
"""
Author: PREETish
Reach me at: https://www.pritishranjan.com
Queries: https://preetblogs.azurewebsites.net/aboutme
Github: PreetRanjan
"""
import zipfile
import io
from azure.storage.filedatalake import FileSystemClient
from datetime import datetime
connection_string = "<your_connection_string>"
file_system_client = FileSystemClient.from_connection_string(connection_string, file_system_name="samples")
def upload_bytes_to_adls(file_system_client,file_path, file_contents):
file_client = file_system_client.get_file_client(file_path)
# Upload bytes to the file
file_client.upload_data(file_contents, overwrite=True)
def read_file_from_adls(file_system_client,file_path):
file_client = file_system_client.get_file_client(file_path)
download = file_client.download_file()
downloaded_bytes = download.readall()
return downloaded_bytes
def extract_zip_in_adls(zip_data,extract_dir):
with io.BytesIO(zip_data) as zip_buffer:
with zipfile.ZipFile(zip_buffer, "r") as zip_file:
for file_name in zip_file.namelist():
with zip_file.open(file_name) as file_in_zip:
extract_path = extract_dir + file_name
print("Extract & Upload to: ",extract_path)
upload_bytes_to_adls(file_system_client,extract_path,file_in_zip.read())
print("Uploaded!!")
zip_file_path = "drivetime.zip"
print("Reading ZIP file:",zip_file_path)
zip_bytes = read_file_from_adls(file_system_client,zip_file_path)
print("Zip file Read. Size: ",len(zip_bytes)," Bytes")
# Extract ZIP files and upload each file to ADLS
print("Running Extract and Uplaod...")
extract_zip_in_adls(zip_bytes,"Extract/")
You can check the code in My Github:
Thanks
Comments
Post a Comment