Conversation
| from pyspark import SparkContext | ||
| from commons.Utils import Utils | ||
|
|
||
| def splitComma(line: str): |
There was a problem hiding this comment.
Have you tried to run this program? It doesn't compile
File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsByLatitudeSolution.py", line 4
def splitComma(line: str):
^
SyntaxError: invalid syntax
There was a problem hiding this comment.
Hi, yes I did ran all programs. Which version of python are you running? This should work in the latest Python 3 version
There was a problem hiding this comment.
sorry for the confusion. It works for Python 3. I was running python 2.7. Feel free to ignore this comment.
|
For all the programs which print to standard output, please set the logging level to ERROR so that there is less noise in the output. |
| from pyspark import SparkContext | ||
| from commons.Utils import Utils | ||
|
|
||
| def splitComma(line: str): |
There was a problem hiding this comment.
Again, it didn't compile, I think you don't need the type.
File "/Users/cwei/code/python-spark-tutorial-new/rdd/airports/AirportsInUsaSolution.py", line 4
def splitComma(line: str):
^
SyntaxError: invalid syntax
| from pyspark import SparkContext | ||
|
|
||
| if __name__ == "__main__": | ||
| sc = SparkContext("local", "collect") |
There was a problem hiding this comment.
Please set the logging level to ERROR similar to what the Scala problem does to reduce the noise of the output
There was a problem hiding this comment.
Hi James, some considerations about the logging level when using pyspark:
- From the script itself, when using pyspark, we can only set the log level after starting the SparkContext, this means that logs printed when the SparkContext is starting will be printed anyway.
- The best way to reduce the noise of the output is to configure the file log4j.properties inside spark/conf folder.
That being said, I will set the log levels to ERROR after the SparkContext starts
| @@ -0,0 +1,11 @@ | |||
| from pyspark import SparkContext | |||
|
|
|||
| if __name__ == "__main__": | |||
There was a problem hiding this comment.
again, set the logging level to ERROR
| @@ -0,0 +1,17 @@ | |||
| from pyspark import SparkContext | |||
|
|
|||
| def isNotHeader(line:str): | |||
There was a problem hiding this comment.
doesn't compile
def isNotHeader(line:str):
^
SyntaxError: invalid syntax```
| @@ -0,0 +1,8 @@ | |||
| from pyspark import SparkContext | |||
|
|
|||
| if __name__ == "__main__": | |||
There was a problem hiding this comment.
set the logging level to ERROR
Converted all the scala files on the RDD folder to python