python - Error handling in PySpark reading in non existent files -
i have massive list of directories , files potentially read from. of them may not exist not problem - ignore error - using try method. there way can allow in pyspark.
here returned error message:
py4j.protocol.py4jjavaerror: error occurred while calling z:org.apache.spark.api.python.pythonrdd.collectandserve. : org.apache.hadoop.mapred.invalidinputexception: input path not exist: file:
i building series of files @ following:
sci = sc.textfile(",".join(paths))
where paths list of paths possible files. check file system , see if exist, there more elegent way of doing this?
the following should work:
for f in file_list: try: read_file(f) except org.apache.hadoop.mapred.invalidinputexception: deal_with_absent_file(f)
Comments
Post a Comment