It's up to you to decide which one is better. Only if there are missing values in all of the variables specified, egen rowmean will put missing value in the generated variable (just like the gen command). egen rowmean, on the other hand, will disregard the missing values and compute the mean of only the nonmissing values in the variable list. This is the case even if the other two scores are not missing. Missing values - If one of the variables mentioned above is missing, gen command will not be able to sum the three variables and will therefore put missing value for mean_score in that observation. To learn more about wildcards, see help varlist.Ģ.
#If else stata update#
This is very useful if the list of valiables is very long, or if you think that later on you might add english_score history_score, and so on, to your dataset, and you don't want to update this command every time. You can use wildcards - The same rowmean command can be written like this: Gen mean_score = (math_score + physics_score + chemistry_score) / 3ġ. You might ask, what's the difference between the rowmean() and simply using the gen command: If you want the mean score (across students) of the mean score (across subjects), you need to first do mean and then rowmean (or vice versa). If you want the mean score in class for any of the subjects (mean score across students), you should use the mean() function instead of the rowmean(). Take the previous example: There's no meaning to do add a by(class_id) option to the egen command when using the rowmean function. Note that the fact that it computes it separately for each of the observations makes the by option irrelevant. This example will simply create a new variable - mean_score - which will hold the mean of math, physics and chemistry score for each of the students. Suppose you had a dataset of students and their scores. The function rowmean also compute means, but instead of computing means of a variable across observations, it compute the mean across variables for each of the observations.Įgen mean_score = rowmean(math_score physics_score chemistry_score) This will put, for each observation, the mean wage of all other observatios with the same firm_id and occupation_id. One can omit the by option - this will put the mean of the original variable for all observations in the dataset.Įgen mean_firm_occupation_wage = mean(wage), by(firm_id occupation_id) See the figure under rowmean() for a graphic illustration. This example will create a variable in which, for each observation, the value will be the mean price of all observations that have the same store_id.
#If else stata how to#
These examples will hopefully clarify how to use the different functions and how can they help us.Įgen store_mean_price = mean(price), by(store_id) There are many of them, all described in help egen, and the following sectios of this step will describe the use of the most commonly used functions. The functions actually determine what the egen command will do. So this is actually the next phase of data manipulation. Things that in other statistical programs might take a lot of commands are possible to do with a couple of egen commands. What's so special, really, about the egen (extensions to genereate) command? The answer is that it lets you do lots of things to the data.