NCDatasets.jl

Documentation for NCDatasets.jl

Datasets

NCDatasets.Dataset — Type.

Dataset(filename::AbstractString,mode::AbstractString = "r";
                 format::Symbol = :netcdf4, attrib = [])

Create a new NetCDF file if the mode is "c". An existing file with the same name will be overwritten. If mode is "a", then an existing file is open into append mode (i.e. existing data in the netCDF file is not overwritten and a variable can be added). With the mode set to "r", an existing netCDF file or OPeNDAP URL can be open in read-only mode. The default mode is "r". The optional parameter attrib is an iterable of attribute name and attribute value pairs, for example a Dict, DataStructures.OrderedDict or simply a vector of pairs (see example below).

Supported formats:

:netcdf4 (default): HDF5-based NetCDF format.
:netcdf4_classic: Only netCDF 3 compatible API features will be used.
:netcdf3_classic: classic netCDF format supporting only files smaller than 2GB.
:netcdf3_64bit_offset: improved netCDF format supporting files larger than 2GB.

Files can also be open and automatically closed with a do block.

Dataset("file.nc") do ds
    data = ds["temperature"][:,:]
end

Dataset("file.nc", "c", attrib = ["title" => "my first netCDF file"]) do ds
   defVar(ds,"temp",[10.,20.,30.],("time",))
end;

mfds = Dataset(fnames,mode = "r"; aggdim = nothing)

Opens a multi-file dataset in read-only "r" or append mode "a". fnames is a vector of file names. Variables are aggregated over the first unimited dimension or over the dimension aggdim if specified.

Note: all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit [1,2].

All variables containing the dimension aggdim are aggerated. The variable who do not contain the dimension aggdim are assumed constant.

[1]: https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux [2]: https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes/8949#8949

Base.keys — Method.

keys(ds::Dataset)

Return a list of all variables names in Dataset ds.

Base.haskey — Function.

haskey(ds::Dataset,varname)

Return true of the Dataset ds has a variable with the name varname. For example:

ds = Dataset("/tmp/test.nc","r")
if haskey(ds,"temperature")
    println("The file has a variable 'temperature'")
end

This example checks if the file /tmp/test.nc has a variable with the name temperature.

Base.getindex — Method.

getindex(ds::Dataset,varname::AbstractString)

Return the NetCDF variable varname in the dataset ds as a NCDataset.CFVariable. The CF convention are honored when the variable is indexed:

_FillValue will be returned as missing
scale_factor and add_offset are applied
time variables (recognized by the units attribute) are returned

as DateTime object.

A call getindex(ds,varname) is usually written as ds[varname].

NCDatasets.variable — Function.

variable(ds::Dataset,varname::String)

Return the NetCDF variable varname in the dataset ds as a NCDataset.Variable. No scaling is applied when this variable is indexes.

NCDatasets.sync — Function.

sync(ds::Dataset)

Write all changes in Dataset ds to the disk.

Base.close — Function.

close(ds::Dataset)

Close the Dataset ds. All pending changes will be written to the disk.

NCDatasets.path — Function.

path(ds::Dataset)

Return the file path (or the opendap URL) of the Dataset ds

Variables

NCDatasets.defVar — Function.

defVar(ds::Dataset,name,vtype,dimnames; kwargs...)
defVar(ds::Dataset,name,data,dimnames; kwargs...)

Define a variable with the name name in the dataset ds. vtype can be Julia types in the table below (with the corresponding NetCDF type). Instead of providing the variable type one can directly give also the data data which will be used to fill the NetCDF variable. The parameter dimnames is a tuple with the names of the dimension. For scalar this parameter is the empty tuple (). The variable is returned (of the type CFVariable).

Note if data is a vector or array of DateTime objects, then the dates are saved as double-precision floats and units "days since 1900-00-00 00:00:00" (unless a time unit is specifed with the attrib keyword described below)

Keyword arguments

fillvalue: A value filled in the NetCDF file to indicate missing data. It will be stored in the _FillValue attribute.
chunksizes: Vector integers setting the chunk size. The total size of a chunk must be less than 4 GiB.
deflatelevel: Compression level: 0 (default) means no compression and 9 means maximum compression. Each chunk will be compressed individually.
shuffle: If true, the shuffle filter is activated which can improve the compression ratio.
checksum: The checksum method can be :fletcher32 or :nochecksum (checksumming is disabled, which is the default)
attrib: An iterable of attribute name and attribute value pairs, for example a Dict, DataStructures.OrderedDict or simply a vector of pairs (see example below)
typename (string): The name of the NetCDF type required for vlen arrays [1]

chunksizes, deflatelevel, shuffle and checksum can only be set on NetCDF 4 files.

NetCDF data types

NetCDF Type	Julia Type
NC_BYTE	Int8
NC_UBYTE	UInt8
NC_SHORT	Int16
NC_INT	Int32
NC_INT64	Int64
NC_FLOAT	Float32
NC_DOUBLE	Float64
NC_CHAR	Char
NC_STRING	String

Example:

julia> data = randn(3,5)
julia> Dataset("test_file.nc","c") do ds
          defVar(ds,"temp",data,("lon","lat"), attrib = [
             "units" => "degree_Celsius",
             "long_name" => "Temperature"
          ])
       end;

[1]: https://web.archive.org/save/https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf-c/nc005fdef005fvlen.html

NCDatasets.dimnames — Function.

dimnames(v::Variable)

Return a tuple of the dimension names of the variable v.

NCDatasets.name — Function.

name(v::Variable)

Return the name of the NetCDF variable v.

NCDatasets.chunking — Function.

storage,chunksizes = chunking(v::Variable)

Return the storage type (:contiguous or :chunked) and the chunk sizes of the varable v.

NCDatasets.deflate — Function.

isshuffled,isdeflated,deflate_level = deflate(v::Variable)

Return compression information of the variable v. If shuffle is true, then shuffling (byte interlacing) is activaded. If deflate is true, then the data chunks (see chunking) are compressed using the compression level deflate_level (0 means no compression and 9 means maximum compression).

NCDatasets.checksum — Function.

checksummethod = checksum(v::Variable)

Return the checksum method of the variable v which can be either be :fletcher32 or :nochecksum.

NCDatasets.loadragged — Function.

 data = loadragged(ncvar,index::Colon)

Load data from ncvar in the contiguous ragged array representation [1] as a vector of vectors. It is typically used to load a list of profiles or time series of different length each.

The indexed ragged array representation [2] is currently not supported.

[1]: https://web.archive.org/web/20190111092546/http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#contiguousraggedarrayrepresentation [2]: https://web.archive.org/web/20190111092546/http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#indexedraggedarrayrepresentation

Different type of arrays are involved when working with NCDatasets. For instance assume that test.nc is a file with a Float32 variable called var. Assume that we open this data set in append mode ("a"):

using NCDatasets
ds = Dataset("test.nc","a")
v_cf = ds["var"]

The variable v_cf has the type CFVariable. No data is actually loaded from disk, but you can query its size, number of dimensions, number elements, ... by the functions size, ndims, length as ordinary Julia arrays. Once you index, the variable v_cf, then the data is loaded and stored into a DataArray:

v_da = v_cf[:,:]

Attributes

The NetCDF dataset (as return by Dataset or NetCDF groups) and the NetCDF variables (as returned by getindex, variable or defVar) have the field attrib which has the type NCDatasets.Attributes and behaves like a julia dictionary.

Base.getindex — Method.

getindex(a::Attributes,name::AbstractString)

Return the value of the attribute called name from the attribute list a. Generally the attributes are loaded by indexing, for example:

ds = Dataset("file.nc")
title = ds.attrib["title"]

Base.setindex! — Method.

Base.setindex!(a::Attributes,data,name::AbstractString)

Set the attribute called name to the value data in the attribute list a. Generally the attributes are defined by indexing, for example:

ds = Dataset("file.nc","c")
ds.attrib["title"] = "my title"

Base.keys — Method.

Base.keys(a::Attributes)

Return a list of the names of all attributes.

Dimensions

NCDatasets.defDim — Function.

defDim(ds::Dataset,name,len)

Define a dimension in the data set ds with the given name and length len. If len is the special value Inf, then the dimension is considered as unlimited, i.e. it will grow as data is added to the NetCDF file.

For example:

ds = Dataset("/tmp/test.nc","c")
defDim(ds,"lon",100)

This defines the dimension lon with the size 100.

Base.setindex! — Method.

Base.setindex!(d::Dimensions,len,name::AbstractString)

Defines the dimension called name to the length len. Generally dimension are defined by indexing, for example:

ds = Dataset("file.nc","c")
ds.dim["longitude"] = 100

If len is the special value Inf, then the dimension is considered as unlimited, i.e. it will grow as data is added to the NetCDF file.

NCDatasets.dimnames — Method.

dimnames(v::Variable)

Return a tuple of the dimension names of the variable v.

Groups

NCDatasets.defGroup — Method.

defGroup(ds::Dataset,groupname, attrib = []))

Create the group with the name groupname in the dataset ds. attrib is a list of attribute name and attribute value pairs (see Dataset).

Base.getindex — Method.

group = getindex(g::NCDatasets.Groups,groupname::AbstractString)

Return the NetCDF group with the name groupname. For example:

julia> ds = Dataset("results.nc", "r");
julia> forecast_group = ds.group["forecast"]
julia> forecast_temp = forecast_group["temperature"]

Base.keys — Method.

Base.keys(g::NCDatasets.Groups)

Return the names of all subgroubs of the group g.

Common methods

One can iterate over a dataset, attribute list, dimensions and NetCDF groups.

for (varname,var) in ds
    # all variables
    @show (varname,size(var))
end

for (dimname,dim) in ds.dims
    # all dimensions
    @show (dimname,dim)
end

for (attribname,attrib) in ds.attrib
    # all attributes
    @show (attribname,attrib)
end

for (groupname,group) in ds.groups
    # all groups
    @show (groupname,group)
end

Time functions

DateTimeStandard
DateTimeJulian
DateTimeProlepticGregorian
DateTimeAllLeap
DateTimeNoLeap
DateTime360Day
Dates.year(dt::AbstractCFDateTime)
Dates.month(dt::AbstractCFDateTime)
Dates.day(dt::AbstractCFDateTime)
Dates.hour(dt::AbstractCFDateTime)
Dates.minute(dt::AbstractCFDateTime)
Dates.second(dt::AbstractCFDateTime)
Dates.millisecond(dt::AbstractCFDateTime)
convert
reinterpret
timedecode
timeencode
daysinmonth
daysinyear

Utility functions

NCDatasets.ncgen — Function.

ncgen(fname; ...)
ncgen(fname,jlname; ...)

Generate the Julia code that would produce a NetCDF file with the same metadata as the NetCDF file fname. The code is placed in the file jlname or printed to the standard output. By default the new NetCDF file is called filename.nc. This can be changed with the optional parameter newfname.

NCDatasets.nomissing — Function.

a = nomissing(da)

Retun the values of the array da of type Array{Union{T,Missing},N} (potentially containing missing values) as a regular Julia array a of the same element type and checks that no missing values are present.

a = nomissing(da,value)

Retun the values of the array da of type Array{Union{T,Missing},N} as a regular Julia array a by replacing all missing value by value.

NCDatasets.varbyattrib — Function.

varbyattrib(ds, attname = attval)

Returns a list of variable(s) which has the attribute attname matching the value attval in the dataset ds. The list is empty if the none of the variables has the match. The output is a list of CFVariables.

Examples

Load all the data of the first variable with standard name "longitude" from the NetCDF file results.nc.

julia> ds = Dataset("results.nc", "r");
julia> data = varbyattrib(ds, standard_name = "longitude")[1][:]

Experimental functions

NCDatasets.ancillaryvariables
NCDatasets.filter

Issues

libnetcdf not properly installed

If you see the following error,

ERROR: LoadError: LoadError: libnetcdf not properly installed. Please run Pkg.build("NCDatasets")

you can try to install netcdf explicitly with Conda:

using Conda
Conda.add("libnetcdf")

NetCDF: Not a valid data type or _FillValue type mismatch

Trying to define the _FillValue, procudes the following error:

ERROR: LoadError: NCDatasets.NetCDFError(-45, "NetCDF: Not a valid data type or _FillValue type mismatch")

The error could be generated by a code like this:

using NCDatasets
# ...
tempvar = defVar(ds,"temp",Float32,("lonc","latc","time"))
tempvar.attrib["_FillValue"] = -9999.

In fact, _FillValue must have the same data type as the corresponding variable. In the case above, tempvar is a 32-bit float and the number -9999. is a 64-bit float (aka double, which is the default floating point type in Julia). It is sufficient to convert the value -9999. to a 32-bit float:

tempvar.attrib["_FillValue"] = Float32(-9999.)

Corner cases

An attribute representing a vector with a single value (e.g. [1]) will be read back as a scalar (1) (same behavior in python netCDF4 1.3.1).
NetCDF and Julia distinguishes between a vector of chars and a string, but both are returned as string for ease of use, in particular

an attribute representing a vector of chars ['u','n','i','t','s'] will be read back as the string "units".

An attribute representing a vector of chars ['u','n','i','t','s','\0'] will also be read back as the string "units" (issue #12).

<!– LocalWords: NCDatasets jl Datasets Dataset netCDF –>