Menu

#13 Adding New Models

1.0
open
nobody
None
2018-05-28
2018-05-09
No

I've updated models.py to include the rap

-------------------------------------------

List of models available for ingest.

-------------------------------------------

MODELS = [
{
'mname': 'nam',
'url': "http://nomads.ncep.noaa.gov/pub/data/nccf/com/nam/prod/",
'ddir': "/scratch/data/cdf/nam/218",
'fglob': ('nam.{runtime}.awip12.grib2',
'nam.{runtime}.awphys
.grib2'),
'times': (0, 6, 12, 18),
'template': 'nam218.cdf'
},
{
'mname': 'gfs',
'url': "http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/",
'ddir': "/scratch/data/cdf/gfs/230",
'fglob': ("gfs.{runtime}.pgrb2full..f??",
"gfs.{runtime}.pgrb2full.
.f???"),
'times': (0, 6, 12, 18),
'template': 'gfs230.cdf'
},
{
'mname': 'rap',
'url': "http://nomads.ncep.noaa.gov/pub/data/nccf/com/rap/prod/",
'ddir': "/scratch/data/cdf/rap",
'fglob': ('nam.{runtime}.awip12.grib2',
'nam.{runtime}.awphys
.grib2'),
'times': (0, 3, 6, 9, 12, 15, 18, 21),
'template': 'rap.cdf'
},

# {
#     'mname': 'dgex',
#     'url': "http://nomads.ncep.noaa.gov/pub/data/nccf/com/dgex/prod/",
#     'ddir': "/scratch/data/cdf/dgex/185",
#     'fglob': ('dgex_conus.{runtime}.*',),
#     'times': (6, 18),
#     'template': 'dgex.cdf'
# }

]

-------------------------------------------

The default list of models to ingest.

-------------------------------------------

ENABLED = ['nam', 'rap']

but it says there is no known model by the name rap when I try to do ifp.ingest --models rap

What's going on here?

Discussion

  • Mike Romberg

    Mike Romberg - 2018-05-09

    Adding the nomads URL is just one of many steps involved in adding new model data. So, just doing that alone is not sufficient to make the data flow. Things involved in getting a new model working:

    • Identify which files on nomads hold the data we are interested in. This does not seem to be documented anywhere I can find at ncep. The file names must mean something to someone but the naming convention remains a mystery to me. Once the needed files are identified configure the ingest script (as you have done) to download them and send them to the grib decoder.
    • The data inside the grib files is also not documented. So, it is a bit of a guessing game to try and figure out what variable names corespone to things like temperature, pressure, surface, boundary layers etc. There are some tables in the ingest code that mostly work for the nam and gfs data I was able to work. But it would not surprise me at all to find other models that use different names for the same things. Maybe there is documentation out there. If not, it is a matter of dumping the grib data and guessing at what it is.
    • Once one has a handle on what the data is, then a template .cdl needs to be created to hold the data in netcdf form. These templates only exist for a few models right now. I know rap does not have one.
    • When the data is converted from grib form into netcdf it is still not fully ingested. It should be visible and useable in the gfe as a D2D model. But there will be no resolution conversion or sensable surface paramaters generated. This is all done by the ifpInit code. So, ipfInit will need to be configured to process these D2D databases into IFP databases. I think there is some old code for RAP in the source tree. But it probably wont work right out of the box. Some adjustments will be needed.

    All of this could be made a whole lot easier if any documentation could be found which describes:

    1. How are the ncep files named? What data is generally in each named file?
    2. What do the grib variables mean? How are the paramaters and level, etc named?

    I have not been able to find any substantial, complete and detailed documentation for this. If some could be found it would be much more straight forward to add more models.

     
  • Mike Romberg

    Mike Romberg - 2018-05-09

    The RAP example you are trying is a good place to point out the documentation problem facing getting this data into a form useable by the gfe. In this directory:

    http://nomads.ncep.noaa.gov/pub/data/nccf/com/rap/prod/

    You will find subdirectories named with five differnt naming conventions. What is the diffence between these names? What do they mean? Some simple and obvious guesses could be made as to what some parts of the names mean. But it would be awesome if a document could be located that spelled it out. Otherwise it is all guessing.

    Let's guess that the model data we are after is in directories named like this:
    

    http://nomads.ncep.noaa.gov/pub/data/nccf/com/rap/prod/rap.20180508/

     Now the same problem repeats itself.   Exactly which files are needed?   What is in them?   There is some kind of name convention used.   But I can't find it documented anywhere.
    
       If one starts opening each of the above grib files with a grib decoder the same problem repeats itself.   There are variable names to be guessed at.  What do they mean?   Some can be easily guessed and others are just a shot in the dark.
    
     
  • Michael Foland

    Michael Foland - 2018-05-09

    "rap.t00z.awip32f01.grib2 "
    Taking a look there we see rap.t (time) 00z , so using the script I edited, it would generate 00 06 12 15 etc.

    Not sure what the 32 is for awip.. howerver f01 is the frame number. It appears the rap has 21 frames. I wonder if you can have it look for anything associating for awip* and have it grab the .grib2 extention :)

     
  • Allan Diegan

    Allan Diegan - 2018-05-27

    Here is a quick guide: http://www.nco.ncep.noaa.gov/pmb/products/nam/nomads.shtml

    Using the NAM from NOMADS, we would be interested in the 12km CONUS which is listed as nam.tCCz.awphysFF.grb2.tm00. The CC=cycle time (00,06,12,18 UTC), and FF is the forecast hour. Since the NAM12 only goes to 84 hours, we would see awphys00.grb2.tm00-awphys84.grb2.tm00

    We also note two .grib2 files such as:

    nam.t00z.awphys00.tm00.grib2
    nam.t00z.awphys00.tm00.grib2.idx

    It should be obvious by the file size the one we would want is the 54M file size.

    Each model has parameters that are listed. For the NAM12, I believe the ones we would want are

    tmpsfc for surface temperature
    apcpsfc for precipitation
    gustsfc for wind gust
    rh2m for surface relative humidity

    etc etc...I can get the complete list when and if you are ready for it

    If I am missing something that you have a question about please let me know and I will quickly get back to you.

    EDIT As I look through models.py GFE looks to be ingesting the correct NAM, so I will continue to look further as to why it says "No grids to copy".

     

    Last edit: Allan Diegan 2018-05-27
  • Michael Foland

    Michael Foland - 2018-05-27

    How do we even get it to injest the RAP or even pull stuff out of a model run like CAPE, etc? I'm not sure how we edit the template file.

     
  • Mike Romberg

    Mike Romberg - 2018-05-28

    The configuration found in ingest/models.py is just the basics of what needs to be setup to sucessfully ingest a new model. The items in each configuration entry are as follows:

    mname - The model name
    url : The URL where the specific files (specified in fglob below) are found
    ddir - The destination directory. This is where the ingestor will place netcdf and work files)
    fglob - a list of glob patterns (*.txt, file_$.txt etc) that specify which files to get
    times - The model run times on a 24 hour clock
    template - The netcdf template file to place decoded grib grids into (see below)

    So, the ingest framework will start looking for files that are matches using the url, model run time, current time of day and flob patterns. When files are found they, will be downloaded. This is only the first step of many. The next step involves the template.

     When one grib file is downloaded for a new model run a template file is copied to the actual destination location and named to corespond to the old AWIPS I convention.   This template file is created from a text based file found in ingest/templates.   For example if you look at ingest/templates/nam218.cdl you can see text definitions for what the netcdf file will be.
    
       This netcdf definition follows no convention either.   It is what was referred to in AWIPS I as "D2D netcdf".  Important things that must be setup in this template are the names of the variables (t, cp, tp, etc), the vertical levels, the grid dimensions and mapping information, etc.   All of this stuff must match what is actually in the grib files.  Or "stuff wont work" :).
    
          Anyway, after the template cdf file is copied to the destination directory, the ingest code then scans through each downloaded grib file and looks for variables in the grib file that match variables in the netcdf file.   This too is not a straight forward process because neither format uses the same names for the same things.  So, the file ingest/gribNames.py contains a table that maps grib variable names to netcdf names.   This table worked for ngm, gfs and nam.  But it would not surprise me at all for there to be other names that need to be added here too.
    
           Other factors such as how a grib file specifies vertical level are more or less hard coded in ingest/nomads.py.   This code mostly worked for gfs, nam and ngm.   But may need to be tweaked for another model.  Anyway, assuming the ingestor finds a match for a grid in the grib file for the variable name, and level it will copy it into the netcdf file.
    
              Once data is in the netcdf file (and there are many things that can go wrong along the way) the ifpServer will see it as a "D2D" database.   At this point the grids will still be in "raw" form.   They will still be in whatever units the grib file had.   And no derrived paramaters will be generated (such as min/max T).   But the server will remap them to match what the rest of the gfe uses.  To be fully ingested the server will need to be configured for an ifpInit scrip to run.   Which is a whole different system.
    
      The whole thing is really messy.   It could be made easier.  But it is hard to do without much documentation on the NOMADS side.  In the end neither grib or "D2D netcdf" are very good formats for holding this kind of data.   It would be great if some model was available in a more easy to comprehend format.  But, for now this seems the best that can be done.  I would take a stab at adding some more models if any kind of detailed docs could be found.   So far I've not really discovered much to go on.
    
     
  • Michael Foland

    Michael Foland - 2018-05-28

    What if you went here and wrote a support ticket: http://nomads.ncep.noaa.gov:9090/

     
  • Allan Diegan

    Allan Diegan - 2018-05-28

    I was able to locate the gribnames and the nam218 file. I think the names for stuff like temperature should be changed from T to tmpsfc or rh to rhprs. i found the names for these products from nomads.ncep.noaa.gov:9090/. I would give you the full link but apparently nomads is down tonight. This page also gives information about the deminsions of the grid, etc. i will look more into it when its back online.

    The odd problem is that the NAM repeatedly only shows 1 hour of data (usually close to the models initialization time). For example if it downloads the 06Z, and I click to copy grids I may get the 07 and 08Z data to show up. Nothing else shows up and sometimes I get a no grids to copy message when there is clearly a cdf file in the folder.. I also let it run for hours even toward the next model run. That part is throwing me for a loop. BUT if even one hour of data loads correctly then apparently the grid parameter names are correct at least for the NAM.

    I will uodate when the server is back up.

    thanks so much!

     
  • Allan Diegan

    Allan Diegan - 2018-05-28

    Nomads is back up. This is the link I was talking about if you click on info...http://nomads.ncep.noaa.gov:9090/dods/nam/nam20180528

    This site gives you a description of the 141 variables that are possible in the NAM. Things like Maximum temperature are listed under tmax2m or minimum temperature listed under tmin2m. Unless it specifically says surface (some variable), then use 2m for 2 meter. For example they do not provide surface relative humidity so we would want 2 meter relative humidity listed as rh2m. Variables that contain multiple layers of the atmosphere such as temperature at the 300 mb, 500mb, 700mb level would be listed as tmpprs. I do understand this part, and I notice that the names of variables listed in gribnames are different from the variables listed in nomads. I believe temperature is listed as T when it should be tmpsfc (temperature surface).

    It does list deminsions such as lat, long, altitude as points (42 points or levels for the NAM). This part I'm not really familiar with in regards to what is listed in the gribnames file and how or if they are different then what the nomads says.

    If you go to http://nomads.ncep.noaa.gov:9090/dods/ you can select any model that is available from nomads and it should give you the variable names and such. I am hoping this helps, as this is all that I can provide. I may be able to provide insight into what variables mean what but this may be the extend of my knowledge especially when it comes to python so excuse me if you have already mentioned something that you do not have documentation for and I didnt provide it in this comment.

    As I mentioned nomads is back up and running. It seems to be downloading data as expected and degribbing them (correctly, I do not know of). I will see if it is able to copy more than 1 hour worth of grids this time. I am starting to wonder if the reason why it says no grids to copy is because the variable names, vertical levels, or map properties may not match what nomads has (I think nomads changed variable names at some point). The reason I say that is because the definition for Temperature is 'T' while in nomads it says it should be 'tmpsfc'. However, like I said before, once in a blue moon it will copy 1 hour worth of grids and it seems to display correctly. But 8 times out of 10 it says "no grids to copy". As you can see why, I am confused and uncertain of what the actual issue could be.

    I will be working on this more today.

     
  • Mike Romberg

    Mike Romberg - 2018-05-28

    Thanks for the link! I think I may have bumped into the nam documentation before. Which is why nam is implemented in ifp.Ingest. What I failed to find is similar documentation for any other models :).

    As far as variable names go, there are three sets of names involved for the same data.   First there is the name of the data in the grib file.  Decoding grib is difficult, error prone and slow.  So, AWIPS I had an ingest system which did that and stored everything of interest in netcdf files.   Then everything else in AWIPS I (including the gfe) just used these netcdf files which were much faster and less error prone.
    
      Of course the variable names were sometimes changed durring the move from grib -> netcdf.   Probably for arbitrary reasons.  Anyway the data in these netcdf files was not directly useable by the gfe for many reasons.  And the gfe has a third nameing convention for paramaters. So the ifp.Init program converts the "D2D netcdf" into a form that the gfe can use by remapping it onto a 5km grid.  Adjusting for surface using high res topo, derriving mint, maxt, combining winds u and v components, convert units, etc.
    
        So a variable may have three different names along the way.   Some variable may be called "temp" in grib "t" in the D2D netcdf and "T" in the gfe.   Sometimes humans can sort this all out.   Computers have a heck of a time figuring out it is really different names for the exact same data :).
    
     

Log in to post a comment.

MongoDB Logo MongoDB