SAS BASE要点笔记.doc

资源ID：2396044 资源大小：719KB 全文页数：119页
资源格式： DOC 下载积分：8金币

快捷下载

会员登录下载

三方登录下载：

下载资源需要8金币

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

加入VIP免费专享

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

SAS BASE要点笔记.doc

Accessing Data and Creating Data StructuresTopic: Accessing Data and Creating Data Structures1.Reading raw data files using INFILE and INPUT statement2.Writing _NULL_ Data Set3. Assigning and change variable attributes4. Import database table or data file into SAS dataset5. Labeling variables6.Reading existing SAS dataset7. Restricting observations while reading data8. Creating temporary and permanent SAS data sets9. Exporting data to different files10.Displaying contents of dataset11. Restricting observations and variables in a SAS data set processed1. Reading raw data files using INFILE and INPUT statement1.1 Introduction1.1.1 Common Step Boundary Keywords:DATA PROC CARDSDATALINES QUITRUN1.1.2 Data Step Flowdata sales; infile rawin;input name $1-10 division $12 years 15-16 sales 19-25;run; proc print data=sales;run;Note: The use of RUN after each step is highly recommendedA. The Compilation PhaseWhen you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items: input bufferis a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer is created only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.)program data vector (PDV)is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set. descriptor information is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables. B. The Execution PhaseBy default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is described as follows: 1. The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. 2. SAS sets the newly created program variables to missing in the program data vector (PDV). 3. SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record. 4. SAS executes any subsequent programming statements for the current record. 5. At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here. 6. SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation. 7. The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file. Note: It shows the default processing of the DATA step. You can code data-reading statements (such as INPUT or SET), or data-writing statements (such as OUTPUT), in any order in your program.Flow of Action in the DATA StepDiagnosing Errors in the Compilation Phase Now that you know how a DATA step is processed, you can use that knowledge to correct errors. There were errors that are detected during the compilation phase, including · misspelled keywords and data set names · missing semicolons · unbalanced quotation marks · invalid options. During the compilation phase, SAS can interpret some syntax errors (such as the keyword DATA misspelled as DAAT). If it cannot interpret the error, SAS · prints the word ERROR followed by an error message in the log · compiles but does not execute the step where the error occurred, and prints the following message to warn you: NOTE: The SAS System stopped processing this step because of errors.Some errors are explained fully by the message that SAS prints; other error messages are not as easy to interpret. For example, because SAS statements are free-format, when you fail to end a SAS statement with a semicolon, SAS does not always detect the error at the point where it occurs.Diagnosing Errors in the Execution Phase As you have seen, errors can occur in the compilation phase, resulting in a DATA step that is compiled but not executed. Errors can also occur during the execution phase. When SAS detects an error in the execution phase, the following can occur, depending on the type of error: · A note, warning, or error message is displayed in the log. · The values that are stored in the program data vector are displayed in the log. · The processing of the step either continues or stops. 1.2 Basic Forms of INPUT Statement The most common way to create new datasets is by submitting a DATA step. The INPUT statement describes what data will be contained in your new data set. It is used to read data from an external source, or from lines contained in your SAS program.1.2.1 List InputUse the List input mode to read data recorded with at least one blank space separating each data field. Missing values are represented as a dot (period). This is the simple form of input (freeform list or format-free). DATA Census; INPUT State $ Pop ; CARDS; NC 5.085 SC 2.590 VA 1.360 MA 3.450 PA .; run;1.2.2 Column InputUse Column input mode to read the following type of data. The variables must be listed in the order in which they appear in the input data.- Character and numeric data - Data values which are entered in fixed column positions - Character values longer than eight characters - Character values that contain embedded blanks Example: INPUT variable < modifier > startcol - endcol; DATA Census; INPUT State $ 1-2 Pop 3-7; CARDS; NC5.082 SC2.590 VA1.360;run;DATA Census2; INPUT State $ 1-10 Pop 11-15 ; CARDS; New York 5.082 New Jersey 2.590 ;run;The numbers after each variable name in the input statement indicate the columns in which this variable can be found. The $ indicates that the variable is character.Notice that the primary difference between this “column” input statement and the “freeform list” input statement is the inclusion of column ranges telling SAS where in the data set to find the information for each variable.1.2.3 Formatted InputFormatted input is a technique for reading data that allows you to specify the beginning column of a filed and optionally its type and format. Use formatted input mode to read the following: Form: INPUT pointer control variable < modifiers > ; Example: data inpt; input 1 custom_id $6. 7 custom_name $6. 14 rental_fee ;datalines;240442Smith 950240910Yang 1120240808Andrew1050;Limitations and advantages of formatted input include:§ Data must be in fixed columns§ Data fields may be read in any order. § Blanks need not to separate fields.§ Input only the variables you need - skip the rest.§ Read in data using informat and format specifications.FormatSpecifies values .ExampleCOMMAw.dthat contain commas and decimal placescomma8.2DOLLARw.dthat contain dollar signs and commasdollar6.2MMDDYYw.as date values of the form 09/12/97 (MMDDYY8.) or 09/12/1997 (MMDDYY10.)mmddyy10.w.rounded to the nearest integer in w spaces7.w.drounded to d decimal places in w spaces8.2$w.as character values in w spaces$12.DATEw.as date values of the form 16OCT99 (DATE7.) or 16OCT1999 (DATE9.)date9.1.2.4 Name InputYou can use named input to read records in which data values are preceded by the name of the variable and an equal sign (=). The following INPUT statement reads the data lines containing equal signs. data games; input name=$ score1= score2=; datalines;name=riley score1=1132 score2=1187;proc print data=games;run;Note: All forms of input, except the named input, can be used in any combination1.2.5 Multiple Styles in a Single INPUT Statementdata mul;input idno name $18. team $ 24-30 startwght endwght;cards;023 David Shaw red 189 165049 Amelia Serrano yellow 189 165;The value of IDNO, STARTWGHT, and ENDWGHT are read with list input, the value of NAME with formatted input, and the value of TEAM with column input.1.3 Pointer Controls As SAS reads values from the input data records into the input buffer, it keeps track of its position with a pointer. The INPUT statement provides three ways to control the movement of the pointer: 1.3.1 column pointer controls: reset the pointer's column position when the data values in the data records are read, such as , +n1.3.2 line pointer controls: reset the pointer's line position when the data values in the data records are read, such as #n, slash(/)1.3.3 line-hold specifiers: hold an input record in the input buffer so that another INPUT statement can process it. By default, the INPUT statement releases the previous record and reads another record, such as & Summary: Pointer Controls: n go to column n +n move the pointer n positions hold the current input line and re-read certain variables. useful when each input line contains values for several observations/ skip to the next line of raw data#n stands for the number of the line of raw data for that observationExample:data p; input 1 x 3. +1 y z 3. ;cards;101 29 169102 30 174103 35 172;More Examples for Multiple Observations per Line or Multiple Lines per Observation:If more than one observation exits on a line, use the option at the end of the input statement.Data pc; Input a b c ;Cards;1 1 1 2 2 2 3 3 34 4 4;run;data mm;input age gender $ / weight height #3 education $ income;datalines;35 M 145 174 College 5800032 F120 163High 34000;If a single observation is spread over across more than one line, the option will hold the current record until all variables have been result inData pc2; Input a b c ;datalines;1 112 2 23334 44 ;1.4 Datalines & Cards They perform same task. Use the DATALINES statement with an INPUT statement to read data that you enter directly in the program, rather than data stored in an external file. You can use only one DATALINES statement in a DATA step. 1.5 Line Hold SpecifiersLine hold specifiers are used to maintain the position of the line and column pointers on the current line in the external file through multiple INPUT statements or multiple iterations of a single data step. Placed at the end of the INPUT statement, they instruct SAS not to read a new record when the next INPUT statement is executed. This capability is the key element of techniques used to read more complex files and to improve efficiency. (trailing at-sign) tells SAS to keep this record current until either an INPUT is executed without a trailing or trailing , or until this iteration of the DATA step is completed. 1.6 INFILE Statement- Identifies an external file to read with an INPUT statement.- Be valid in a DATA step.- MUST appear BEFORE the input statement that reads data from the file.- Informs SAS where the Data file is located and name of the fileSyntax:INFILE file-specification <options> <host-options> Reading external file with INFILE statement and command options/* Example 1: Reading external file or existing file with DLM option*/;Data my2; length custom_id $ 6 department $20; infile 'a:mydata2.txt' DLM=',' FIRSTOBS=2; input custom_id $ department $; run;Note: dlm: This option allows you to tell SAS what character is used as a delimiter in a file. If this option is not specified, SAS assumes the delimiter is a space. Some common delimiters are comma, vertical pipe, semi-colon, and tab. The syntax for the option would be as follows: indicates that commas are used to separate variables within the text file. If data had tab characters between values instead of commas, then you could use the following program to read the file.data tab; infile 'a:tab.txt' dlm='09'x; input x y z;run;/* Example 2: Reading external file or existing file with DSD option*/;DSD (Delimited Separated Data) - It has three functions when reading delimited files. The first function is to strip off any quotes that surround values in the text file. The second function deals with missing values. When SAS encounters consecutive delimiters in a file, the default action is to treat the delimiters as one unit. If a file has consecutive delimiters, its usually because there are missing values between them. DSD tells SAS to treat consecutive delimiters separately; therefore, a value that is missing between consecutive delimiters will be read as a missing value when DSD is specified. The third function assumes the delimiter is a comma. If DSD is specified and the delimiter is a comma, the DLM= option is not necessary. If another delimiter is used, the DLM= option must be used as well. data temp; infile cards dsd dlm=','/*strip off quotes*/; input a b c d;cards;"54","75","253","44""87","3","55","465""905","","66","354"data temp2; infile cards dsd; /* treat consecutive delimiters*/; input a b c;cards;54,4487,55,4690,35;data temp3; infile 'a:phone.txt' d

注意事项

本文（SAS BASE要点笔记.doc）为本站会员（文库蛋蛋多）主动上传，三一办公仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三一办公（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。