python - Error handling in PySpark reading in non existent files -

- August 15, 2012

i have massive list of directories , files potentially read from. of them may not exist not problem - ignore error - using try method. there way can allow in pyspark.

here returned error message:

py4j.protocol.py4jjavaerror: error occurred while calling z:org.apache.spark.api.python.pythonrdd.collectandserve. : org.apache.hadoop.mapred.invalidinputexception: input path not exist: file:

i building series of files @ following:

sci = sc.textfile(",".join(paths))

where paths list of paths possible files. check file system , see if exist, there more elegent way of doing this?

the following should work:

for f in file_list:    try:        read_file(f)    except org.apache.hadoop.mapred.invalidinputexception:        deal_with_absent_file(f)

Search This Blog

Alconcel

python - Error handling in PySpark reading in non existent files -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -