python - Error handling in PySpark reading in non existent files -


i have massive list of directories , files potentially read from. of them may not exist not problem - ignore error - using try method. there way can allow in pyspark.

here returned error message:

py4j.protocol.py4jjavaerror: error occurred while calling z:org.apache.spark.api.python.pythonrdd.collectandserve. : org.apache.hadoop.mapred.invalidinputexception: input path not exist: file: 

i building series of files @ following:

sci = sc.textfile(",".join(paths)) 

where paths list of paths possible files. check file system , see if exist, there more elegent way of doing this?

the following should work:

for f in file_list:    try:        read_file(f)    except org.apache.hadoop.mapred.invalidinputexception:        deal_with_absent_file(f) 

Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -