Skip to main content
Filter by
Sorted by
Tagged with
0 votes
1 answer
82 views

Accessing Azure Key Vault Secrets list from Fabric Notebook using Managed Private Endpoint

I'm trying to retrieve the list of secrets from an Azure Key Vault using a Fabric Notebook in PySpark. I've a Managed Private Endpoint configured in the Fabric workspace pointing to the Key Vault: ...
coding's user avatar
  • 167
1 vote
2 answers
49 views

How can I use a PySpark UDF in a for loop?

I need a PySpark UDF with a for loop to create new columns but with conditions based on the iterator value. def test_map(col): if x == 1: if col < 0.55: return 1.2 ...
Chuck's user avatar
  • 1,305
1 vote
2 answers
56 views

Databricks dataframe join - ambiguous columns

I am facing a problem in my Databricks Delta Live Table (DLT) notebook. I am trying to join together two dataframes, of which one df is derived from the other, but I keep getting the following error: &...
Mads's user avatar
  • 37
0 votes
1 answer
34 views

Best Practices for Selecting Primary Key Combinations from Multiple Columns

I am working in Azure Databricks with a large PySpark DataFrame that has 170 columns. I need to identify the best possible combination of 2-3 columns to use as the primary key, ensuring: Uniqueness: ...
anuj's user avatar
  • 142
0 votes
0 answers
29 views

Allocation of Executors in Callee/Caller Spark Notebooks

I am working in Azure Synapse analytics and have a wrapper notebook calling two other notebooks, so am running something like: mssparkutils.notebook.run('/notebook_1_path', 3600) mssparkutils.notebook....
Michael's user avatar
1 vote
0 answers
39 views

spark connect udf fails with "SparkContext or SparkSession should be created first"

I have a Spark Connect server running. Things are fine when I don't use UDFs (df.show() always works fine). But when I use UDF, it fails with SparkContext or SparkSession should be created first. ...
Kashyap's user avatar
  • 17.5k
-5 votes
0 answers
31 views

TypeError: 'Column' object is not callable in pyspark file reading [closed]

Trying to plot a histogram It says column not collable. I am using hadoop framework and doing a churn analysis using spypark, reading file in spypark. I am leaning python as a new comer and first time ...
Anuj Patil's user avatar
0 votes
0 answers
31 views

Filtering a dataframe provides a different output every time

im having an issue when creating this python script with pyspark. I am pulling old data from two sources, merging them, adding some columns to the dataframe, and then applying a filter on them to ...
Cris Manrique's user avatar
0 votes
0 answers
20 views

Unable to read messages from kafka broker with PySpark

Problem: I am trying to connect to a Kafka broker using PySpark, but when consuming messages from the test-topic, I receive empty NULL values instead of the expected JSON content. consumer.py code def ...
KurczakChrupiacy2's user avatar
-1 votes
0 answers
27 views

Pyspark - spark-submit logging for both driver and executor

New to PySpark, I am using spark-submit to execute the program and logging.config package to log the executor logs to a file and exception to email errors. But logging is not working, nothing is ...
Kavya shree's user avatar
0 votes
0 answers
31 views

How to flatten nested JSON in pyspark

I have a JSON file that looks like this: [ { "student_id": 1234, "room_id": "abc", "enrolled": false }, { "student_id": 4321, &...
unlocknew's user avatar
0 votes
0 answers
28 views

Delta Lake Merge Rewrites unchanged files

I want to do a merge on a subset of my delta table partitions to do incremental upserts to keep two tables in sync. I do not use a whenNotMatchedBySource statement to clean up stale rows in my target ...
ExploitedRoutine's user avatar
0 votes
1 answer
27 views

Databricks: Generate Multiple Excels for SQL Query

I am getting "OSError: Errno 95: Operation not supported for the code below. I have 'openpyxl 3.1.5' installed on the cluster and have imported all required modules. I am sure this is something ...
libpekin1847's user avatar
0 votes
0 answers
36 views

how concatenate a json key columns using coma as separator

i have a problem to resolve this: Having a json file, like: { "database": { "table_1": { "load_type": "Delta", "columns&...
Julio's user avatar
  • 551
0 votes
0 answers
25 views

Py4JJavaError : An error occurred while calling o745.save.\n: org.apache.spark.SparkException

I've just started working with Spark, I've built and trained the model, I'm having trouble saving it. from pyspark.ml.regression import GBTRegressor gbt = GBTRegressor(featuresCol="features",...
Burak Turan's user avatar
0 votes
1 answer
30 views

DLT pipeline inserts pipeline Update ID into the table name and raises permission denied error on it

I have a DLT pipeline in Databricks that I am trying to execute - not for the first time as it has worked before, but I'm seeing this strange behaviour in the pipeline such that it uses the Update ID ...
Rcheologist's user avatar
0 votes
0 answers
33 views

Suppress py4j.clientserver logs in pyspark (databricks)

This seems to have been asked a few times, but I am raising this since none of the answers work for me. This is the problem I have: databricks db article I have a python whl task in databricks (...
Saugat Mukherjee's user avatar
0 votes
0 answers
17 views

Using pyspark databricks UDFs with outside function imports

Problem with minimal example The below minimal example does not run locally with databricks-connect==15.3 but does run within databricks workspace. main.py from databricks.connect import ...
Tobias's user avatar
  • 1