NCDatasets.jl

NCDatasets.jl

Documentation for NCDatasets.jl

Datasets

Dataset(filename::AbstractString,mode::AbstractString = "r";
                 format::Symbol = :netcdf4, attrib = [])

Create a new NetCDF file if the mode is "c". An existing file with the same name will be overwritten. If mode is "a", then an existing file is open into append mode (i.e. existing data in the netCDF file is not overwritten and a variable can be added). With the mode set to "r", an existing netCDF file or OPeNDAP URL can be open in read-only mode. The default mode is "r". The optional parameter attrib is an iterable of attribute name and attribute value pairs, for example a Dict, DataStructures.OrderedDict or simply a vector of pairs (see example below).

Supported formats:

  • :netcdf4 (default): HDF5-based NetCDF format.

  • :netcdf4_classic: Only netCDF 3 compatible API features will be used.

  • :netcdf3_classic: classic netCDF format supporting only files smaller than 2GB.

  • :netcdf3_64bit_offset: improved netCDF format supporting files larger than 2GB.

Files can also be open and automatically closed with a do block.

Dataset("file.nc") do ds
    data = ds["temperature"][:,:]
end
Dataset("file.nc", "c", attrib = ["title" => "my first netCDF file"]) do ds
   defVar(ds,"temp",[10.,20.,30.],("time",))
end;
source
mfds = Dataset(fnames,mode = "r"; aggdim = nothing)

Opens a multi-file dataset in read-only "r" or append mode "a". fnames is a vector of file names. Variables are aggregated over the first unimited dimension or over the dimension aggdim if specified.

Note: all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit [1,2].

All variables containing the dimension aggdim are aggerated. The variable who do not contain the dimension aggdim are assumed constant.

[1] https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux [2] https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes/8949#8949

source
Base.keysMethod.
keys(ds::Dataset)

Return a list of all variables names in Dataset ds.

source
Base.haskeyFunction.
haskey(ds::Dataset,varname)

Return true of the Dataset ds has a variable with the name varname. For example:

ds = Dataset("/tmp/test.nc","r")
if haskey(ds,"temperature")
    println("The file has a variable 'temperature'")
end

This example checks if the file /tmp/test.nc has a variable with the name temperature.

source
Base.getindexMethod.
getindex(ds::Dataset,varname::AbstractString)

Return the NetCDF variable varname in the dataset ds as a NCDataset.CFVariable. The CF convention are honored when the variable is indexed:

  • _FillValue will be returned as missing

  • scale_factor and add_offset are applied

  • time variables (recognized by the units attribute) are returned

as DateTime object.

A call getindex(ds,varname) is usually written as ds[varname].

source
NCDatasets.variableFunction.
variable(ds::Dataset,varname::String)

Return the NetCDF variable varname in the dataset ds as a NCDataset.Variable. No scaling is applied when this variable is indexes.

source
NCDatasets.syncFunction.
sync(ds::Dataset)

Write all changes in Dataset ds to the disk.

source
Base.closeFunction.
close(ds::Dataset)

Close the Dataset ds. All pending changes will be written to the disk.

source
NCDatasets.pathFunction.
path(ds::Dataset)

Return the file path (or the opendap URL) of the Dataset ds

source

Variables

NCDatasets.defVarFunction.
defVar(ds::Dataset,name,vtype,dimnames; kwargs...)
defVar(ds::Dataset,name,data,dimnames; kwargs...)

Define a variable with the name name in the dataset ds. vtype can be Julia types in the table below (with the corresponding NetCDF type). Instead of providing the variable type one can directly give also the data data which will be used to fill the NetCDF variable. The parameter dimnames is a tuple with the names of the dimension. For scalar this parameter is the empty tuple (). The variable is returned (of the type CFVariable).

Note if data is a vector or array of DateTime objects, then the dates are saved as double-precision floats and units "days since 1900-00-00 00:00:00" (unless a time unit is specifed with the attrib keyword described below)

Keyword arguments

  • fillvalue: A value filled in the NetCDF file to indicate missing data. It will be stored in the _FillValue attribute.

  • chunksizes: Vector integers setting the chunk size. The total size of a chunk must be less than 4 GiB.

  • deflatelevel: Compression level: 0 (default) means no compression and 9 means maximum compression. Each chunk will be compressed individually.

  • shuffle: If true, the shuffle filter is activated which can improve the compression ratio.

  • checksum: The checksum method can be :fletcher32 or :nochecksum (checksumming is disabled, which is the default)

  • attrib: An iterable of attribute name and attribute value pairs, for example a Dict, DataStructures.OrderedDict or simply a vector of pairs (see example below)

  • typename (string): The name of the NetCDF type required for vlen arrays [1]

chunksizes, deflatelevel, shuffle and checksum can only be set on NetCDF 4 files.

NetCDF data types

NetCDF TypeJulia Type
NC_BYTEInt8
NC_UBYTEUInt8
NC_SHORTInt16
NC_INTInt32
NC_INT64Int64
NC_FLOATFloat32
NC_DOUBLEFloat64
NC_CHARChar
NC_STRINGString

Example:

julia> data = randn(3,5)
julia> Dataset("test_file.nc","c") do ds
          defVar(ds,"temp",data,("lon","lat"), attrib = [
             "units" => "degree_Celsius",
             "long_name" => "Temperature"
          ])
       end;

[1] https://web.archive.org/save/https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf-c/nc_005fdef_005fvlen.html

source
NCDatasets.dimnamesFunction.
dimnames(v::Variable)

Return a tuple of the dimension names of the variable v.

source
NCDatasets.nameFunction.
name(v::Variable)

Return the name of the NetCDF variable v.

source
NCDatasets.chunkingFunction.
storage,chunksizes = chunking(v::Variable)

Return the storage type (:contiguous or :chunked) and the chunk sizes of the varable v.

source
NCDatasets.deflateFunction.
isshuffled,isdeflated,deflate_level = deflate(v::Variable)

Return compression information of the variable v. If shuffle is true, then shuffling (byte interlacing) is activaded. If deflate is true, then the data chunks (see chunking) are compressed using the compression level deflate_level (0 means no compression and 9 means maximum compression).

source
NCDatasets.checksumFunction.

checksummethod = checksum(v::Variable)

Return the checksum method of the variable v which can be either be :fletcher32 or :nochecksum.

source

Different type of arrays are involved when working with NCDatasets. For instance assume that test.nc is a file with a Float32 variable called var. Assume that we open this data set in append mode ("a"):

using NCDatasets
ds = Dataset("test.nc","a")
v_cf = ds["var"]

The variable v_cf has the type CFVariable. No data is actually loaded from disk, but you can query its size, number of dimensions, number elements, ... by the functions size, ndims, length as ordinary Julia arrays. Once you index, the variable v_cf, then the data is loaded and stored into a DataArray:

v_da = v_cf[:,:]

Attributes

The NetCDF dataset (as return by Dataset or NetCDF groups) and the NetCDF variables (as returned by getindex, variable or defVar) have the field attrib which has the type NCDatasets.Attributes and behaves like a julia dictionary.

Base.getindexMethod.
getindex(a::Attributes,name::AbstractString)

Return the value of the attribute called name from the attribute list a. Generally the attributes are loaded by indexing, for example:

ds = Dataset("file.nc")
title = ds.attrib["title"]
source
Base.setindex!Method.
Base.setindex!(a::Attributes,data,name::AbstractString)

Set the attribute called name to the value data in the attribute list a. Generally the attributes are defined by indexing, for example:

ds = Dataset("file.nc","c")
ds.attrib["title"] = "my title"
source
Base.keysMethod.

Base.keys(a::Attributes)

Return a list of the names of all attributes.

source

Dimensions

NCDatasets.defDimFunction.
defDim(ds::Dataset,name,len)

Define a dimension in the data set ds with the given name and length len. If len is the special value Inf, then the dimension is considered as unlimited, i.e. it will grow as data is added to the NetCDF file.

For example:

ds = Dataset("/tmp/test.nc","c")
defDim(ds,"lon",100)

This defines the dimension lon with the size 100.

source
Base.setindex!Method.
Base.setindex!(d::Dimensions,len,name::AbstractString)

Defines the dimension called name to the length len. Generally dimension are defined by indexing, for example:

ds = Dataset("file.nc","c")
ds.dim["longitude"] = 100

If len is the special value Inf, then the dimension is considered as unlimited, i.e. it will grow as data is added to the NetCDF file.

source
dimnames(v::Variable)

Return a tuple of the dimension names of the variable v.

source

Groups

defGroup(ds::Dataset,groupname, attrib = []))

Create the group with the name groupname in the dataset ds. attrib is a list of attribute name and attribute value pairs (see Dataset).

source
Base.getindexMethod.
group = getindex(g::NCDatasets.Groups,groupname::AbstractString)

Return the NetCDF group with the name groupname. For example:

julia> ds = Dataset("results.nc", "r");
julia> forecast_group = ds.group["forecast"]
julia> forecast_temp = forecast_group["temperature"]
source
Base.keysMethod.
Base.keys(g::NCDatasets.Groups)

Return the names of all subgroubs of the group g.

source

Common methods

One can iterate over a dataset, attribute list, dimensions and NetCDF groups.

for (varname,var) in ds
    # all variables
    @show (varname,size(var))
end

for (dimname,dim) in ds.dims
    # all dimensions
    @show (dimname,dim)
end

for (attribname,attrib) in ds.attrib
    # all attributes
    @show (attribname,attrib)
end

for (groupname,group) in ds.groups
    # all groups
    @show (groupname,group)
end

Time functions

NCDatasets.DateTimeStandard(y, [m, d, h, mi, s, ms]) -> NCDatasets.DateTimeStandard

Construct a NCDatasets.DateTimeStandard type by year (y), month (m, default 1), day (d, default 1), hour (h, default 0), minute (mi, default 0), second (s, default 0), millisecond (ms, default 0). All arguments must be convertible to Int64. NCDatasets.DateTimeStandard is a subtype of AbstractCFDateTime.

The netCDF CF calendars are defined at [1].

[1] https://web.archive.org/web/20180622080424/http://cfconventions.org/cf-conventions/cf-conventions.html#calendar

source
NCDatasets.DateTimeJulian(y, [m, d, h, mi, s, ms]) -> NCDatasets.DateTimeJulian

Construct a NCDatasets.DateTimeJulian type by year (y), month (m, default 1), day (d, default 1), hour (h, default 0), minute (mi, default 0), second (s, default 0), millisecond (ms, default 0). All arguments must be convertible to Int64. NCDatasets.DateTimeJulian is a subtype of AbstractCFDateTime.

The netCDF CF calendars are defined at [1].

[1] https://web.archive.org/web/20180622080424/http://cfconventions.org/cf-conventions/cf-conventions.html#calendar

source
NCDatasets.DateTimeProlepticGregorian(y, [m, d, h, mi, s, ms]) -> NCDatasets.DateTimeProlepticGregorian

Construct a NCDatasets.DateTimeProlepticGregorian type by year (y), month (m, default 1), day (d, default 1), hour (h, default 0), minute (mi, default 0), second (s, default 0), millisecond (ms, default 0). All arguments must be convertible to Int64. NCDatasets.DateTimeProlepticGregorian is a subtype of AbstractCFDateTime.

The netCDF CF calendars are defined at [1].

[1] https://web.archive.org/web/20180622080424/http://cfconventions.org/cf-conventions/cf-conventions.html#calendar

source
NCDatasets.DateTimeAllLeap(y, [m, d, h, mi, s, ms]) -> NCDatasets.DateTimeAllLeap

Construct a NCDatasets.DateTimeAllLeap type by year (y), month (m, default 1), day (d, default 1), hour (h, default 0), minute (mi, default 0), second (s, default 0), millisecond (ms, default 0). All arguments must be convertible to Int64. NCDatasets.DateTimeAllLeap is a subtype of AbstractCFDateTime.

The netCDF CF calendars are defined at [1].

[1] https://web.archive.org/web/20180622080424/http://cfconventions.org/cf-conventions/cf-conventions.html#calendar

source
NCDatasets.DateTimeNoLeap(y, [m, d, h, mi, s, ms]) -> NCDatasets.DateTimeNoLeap

Construct a NCDatasets.DateTimeNoLeap type by year (y), month (m, default 1), day (d, default 1), hour (h, default 0), minute (mi, default 0), second (s, default 0), millisecond (ms, default 0). All arguments must be convertible to Int64. NCDatasets.DateTimeNoLeap is a subtype of AbstractCFDateTime.

The netCDF CF calendars are defined at [1].

[1] https://web.archive.org/web/20180622080424/http://cfconventions.org/cf-conventions/cf-conventions.html#calendar

source
NCDatasets.DateTime360Day(y, [m, d, h, mi, s, ms]) -> NCDatasets.DateTime360Day

Construct a NCDatasets.DateTime360Day type by year (y), month (m, default 1), day (d, default 1), hour (h, default 0), minute (mi, default 0), second (s, default 0), millisecond (ms, default 0). All arguments must be convertible to Int64. NCDatasets.DateTime360Day is a subtype of AbstractCFDateTime.

The netCDF CF calendars are defined at [1].

[1] https://web.archive.org/web/20180622080424/http://cfconventions.org/cf-conventions/cf-conventions.html#calendar

source
Base.Dates.yearMethod.
Dates.year(dt::AbstractCFDateTime) -> Int64

Extract the year-part of a AbstractCFDateTime as an Int64.

source
Base.Dates.monthMethod.
Dates.month(dt::AbstractCFDateTime) -> Int64

Extract the month-part of a AbstractCFDateTime as an Int64.

source
Base.Dates.dayMethod.
Dates.day(dt::AbstractCFDateTime) -> Int64

Extract the day-part of a AbstractCFDateTime as an Int64.

source
Base.Dates.hourMethod.
Dates.hour(dt::AbstractCFDateTime) -> Int64

Extract the hour-part of a AbstractCFDateTime as an Int64.

source
Base.Dates.minuteMethod.
Dates.minute(dt::AbstractCFDateTime) -> Int64

Extract the minute-part of a AbstractCFDateTime as an Int64.

source
Base.Dates.secondMethod.
Dates.second(dt::AbstractCFDateTime) -> Int64

Extract the second-part of a AbstractCFDateTime as an Int64.

source
Dates.millisecond(dt::AbstractCFDateTime) -> Int64

Extract the millisecond-part of a AbstractCFDateTime as an Int64.

source
Base.convertFunction.
dt2 = convert(::Type{T}, dt)

Convert a DateTime of type DateTimeStandard, DateTimeProlepticGregorian, DateTimeJulian or DateTime into the type T which can also be either DateTimeStandard, DateTimeProlepticGregorian, DateTimeJulian or DateTime.

Converstion is done such that durations (difference of DateTime types) are preserved. For dates on and after 1582-10-15, the year, month and days are the same for the types DateTimeStandard, DateTimeProlepticGregorian and DateTime.

For dates before 1582-10-15, the year, month and days are the same for the types DateTimeStandard and DateTimeJulian.

source
Base.reinterpretFunction.
dt2 = reinterpret(::Type{T}, dt)

Convert a variable dt of type DateTime, DateTimeStandard, DateTimeJulian, DateTimeProlepticGregorian, DateTimeAllLeap, DateTimeNoLeap or DateTime360Day into the date time type T using the same values for year, month, day, minute, second and millisecond. The convertion might fail if a particular date does not exist in the target calendar.

source
NCDatasets.timedecodeFunction.
dt = timedecode(data,units,calendar = "standard", prefer_datetime = true)

Decode the time information in data as given by the units units according to the specified calendar. Valid values for calendar are "standard", "gregorian", "proleptic_gregorian", "julian", "noleap", "365_day", "all_leap", "366_day" and "360_day".

If prefer_datetime is true (default), dates are converted to the DateTime type (for the calendars "standard", "gregorian", "proleptic_gregorian" and "julian"). Such convertion is not possible for the other calendars.

CalendarType (prefer_datetime=true)Type (prefer_datetime=false)
standard, gregorianDateTimeDateTimeStandard
proleptic_gregorianDateTimeDateTimeProlepticGregorian
julianDateTimeDateTimeJulian
noleap, 365_dayDateTimeNoLeapDateTimeNoLeap
all_leap, 366_dayDateTimeAllLeapDateTimeAllLeap
360_dayDateTime360DayDateTime360Day
source
NCDatasets.timeencodeFunction.
data = timeencode(dt,units,calendar = "standard")

Convert a vector or array of DateTime (or DateTimeStandard, DateTimeProlepticGregorian, DateTimeJulian, DateTimeNoLeap, DateTimeAllLeap, DateTime360Day) accoring to the specified units (e.g. "days since 2000-01-01 00:00:00") using the calendar calendar. Valid values for calendar are: "standard", "gregorian", "proleptic_gregorian", "julian", "noleap", "365_day", "all_leap", "366_day", "360_day".

source
monthlength = daysinmonth(::Type{DT},y,m)

Returns the number of days in a month for the year y and the month m according to the calenar given by the type DT.

Example

julia> daysinmonth(DateTimeAllLeap,2001,2)
29
source
monthlength = daysinmonth(t)

Returns the number of days in a month containing the date t

Example

julia> daysinmonth(DateTimeAllLeap(2001,2,1))
29
source
Base.Dates.daysinyearFunction.
yearlength = daysinyear(::Type{DT},y)

Returns the number of days in a year for the year y according to the calenar given by the type DT.

Example

julia> daysinyear(DateTimeAllLeap,2001,2)
366
source
yearlength = daysinyear(t)

Returns the number of days in a year containing the date t

Example

julia> daysinyear(DateTimeAllLeap(2001,2,1))
366
source

Utility functions

NCDatasets.ncgenFunction.
ncgen(fname; ...)
ncgen(fname,jlname; ...)

Generate the Julia code that would produce a NetCDF file with the same metadata as the NetCDF file fname. The code is placed in the file jlname or printed to the standard output. By default the new NetCDF file is called filename.nc. This can be changed with the optional parameter newfname.

source
NCDatasets.nomissingFunction.
a = nomissing(da)

Retun the values of the array da of type Array{Union{T,Missing},N} (potentially containing missing values) as a regular Julia array a of the same element type and checks that no missing values are present.

source
a = nomissing(da,value)

Retun the values of the array da of type Array{Union{T,Missing},N} as a regular Julia array a by replacing all missing value by value.

source
varbyattrib(ds, attname = attval)

Returns a list of variable(s) which has the attribute attname matching the value attval in the dataset ds. The list is empty if the none of the variables has the match. The output is a list of CFVariables.

Examples

Load all the data of the first variable with standard name "longitude" from the NetCDF file results.nc.

julia> ds = Dataset("results.nc", "r");
julia> data = varbyattrib(ds, standard_name = "longitude")[1][:]
source

Experimental functions

NCDatasets.ancillaryvariables
NCDatasets.filter

Issues

libnetcdf not properly installed

If you see the following error,

ERROR: LoadError: LoadError: libnetcdf not properly installed. Please run Pkg.build("NCDatasets")

you can try to install netcdf explicitly with Conda:

using Conda
Conda.add("libnetcdf")

NetCDF: Not a valid data type or _FillValue type mismatch

Trying to define the _FillValue, procudes the following error:

ERROR: LoadError: NCDatasets.NetCDFError(-45, "NetCDF: Not a valid data type or _FillValue type mismatch")

The error could be generated by a code like this:

using NCDatasets
# ...
tempvar = defVar(ds,"temp",Float32,("lonc","latc","time"))
tempvar.attrib["_FillValue"] = -9999.

In fact, _FillValue must have the same data type as the corresponding variable. In the case above, tempvar is a 32-bit float and the number -9999. is a 64-bit float (aka double, which is the default floating point type in Julia). It is sufficient to convert the value -9999. to a 32-bit float:

tempvar.attrib["_FillValue"] = Float32(-9999.)

Corner cases

an attribute representing a vector of chars ['u','n','i','t','s'] will be read back as the string "units".

<!– LocalWords: NCDatasets jl Datasets Dataset netCDF –>