在pig中嵌入python程序

  • 时间:
  • 来源:互联网
在pig中嵌入python程序
python程序如下,保存在/home/zkf/File/Pig/untitled0.py

import sys

for line in sys.stdin:
    (c,n,s) = line.split()
    if int(s) >= 60:
        print "%s\t%s\t%s"%(c,n,s)


pig程序如下,保存在/home/zkf/File/Pig/testPython-Pig/testPython-pig.pig
records = load'/user/student.txt' using PigStorage(':') as(classNo:chararray, studNo:chararray, score:int);
dump records;
define pass `untitled0.py` SHIP('/home/zkf/File/Pig/untitled0.py');
records_pass = stream records through pass as(classNo:chararray, studNo:chararray, score:int);
dump records_pass;


加载的文件如下,保存在分布式系统的 /user/student.txt 
C01:N0101:82
C01:N0102:59
C01:N0103:65
C02:N0201:81
C02:N0202:82
C02:N0203:79
C03:N0301:56
C03:N0302:92
C03:N0306:72

执行的结果如下,浅黄色的可以忽略,只是一些执行成功的信息,最后的橘黄色部分才是我们需要的结果:
2017-03-22 22:09:10,242 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: STREAMING
2017-03-22 22:09:10,243 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2017-03-22 22:09:10,247 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2017-03-22 22:09:10,248 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2017-03-22 22:09:10,248 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2017-03-22 22:09:10,252 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2017-03-22 22:09:10,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-03-22 22:09:10,253 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job287657154798300158.jar
2017-03-22 22:09:12,274 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job287657154798300158.jar created
2017-03-22 22:09:12,811 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2017-03-22 22:09:12,812 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2017-03-22 22:09:12,812 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2017-03-22 22:09:12,812 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-03-22 22:09:12,812 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Map only job, skipping reducer estimation
2017-03-22 22:09:12,820 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2017-03-22 22:09:12,934 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 22:09:12,936 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2017-03-22 22:09:12,937 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2017-03-22 22:09:13,321 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201703222151_0004
2017-03-22 22:09:13,321 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases records,records_pass
2017-03-22 22:09:13,321 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records[1,10],records[-1,-1],records_pass[7,15],records_pass[-1,-1] C:  R: 
2017-03-22 22:09:13,321 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201703222151_0004
2017-03-22 22:09:13,323 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2017-03-22 22:09:17,332 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2017-03-22 22:09:23,353 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-03-22 22:09:23,353 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 


HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.2.1 0.12.0 zkf 2017-03-22 22:09:10 2017-03-22 22:09:23 STREAMING


Success!


Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_201703222151_0004 1 0 2 2 2 2 n/a n/a n/a n/a records,records_pass STREAMING,MAP_ONLY hdfs://localhost:9000/tmp/temp-266321578/tmp-846389344,


Input(s):
Successfully read 9 records (474 bytes) from: "/user/student.txt"


Output(s):
Successfully stored 7 records (140 bytes) in: "hdfs://localhost:9000/tmp/temp-266321578/tmp-846389344"


Counters:
Total records written : 7
Total bytes written : 140
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0


Job DAG:
job_201703222151_0004




2017-03-22 22:09:23,357 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2017-03-22 22:09:23,358 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2017-03-22 22:09:23,361 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-03-22 22:09:23,361 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

(C01,N0101,82)
(C01,N0103,65)
(C02,N0201,81)
(C02,N0202,82)
(C02,N0203,79)
(C03,N0302,92)
(C03,N0306,72)

本文链接http://element-ui.cn/news/show-576592.aspx