Welcome to HighTEA client’s documentation

User software for interacting with the HighTEA database from the Centre for Precision Studies in Particle Physics. See the central website of HighTEA (High-energy Theory Event Analysis):

http://www.precision.hep.phy.cam.ac.uk/hightea/

and the physics publication:

arxiv:2304.05993

View this README including the inline documentation of the python library on ReadTheDocs.

Installation

This package is available on PyPi via:

pip install hightea-client

The python library interface can be imported via import hightea.client. For a quick start, we recommend new users to have a look at the examples and tutorial that can be found here:

HighTEA Examples.

For details on the functionality please refer to the ReadTheDocs documentation.

The HighTEA API

Through the interface the users makes web requests to the HighTEA database service. A dynamic documentation of the API can be found here.

Here a description of the most user-relevant APIs:

  • api/processes: Returns a key-value map where the keys are the identifiers for the available processes and the value is the short description of the process. The identifying keys are to be used in subsequent API calls to retrieve metadata or perform analysis on the process. For example browsing to https://www.hep.phy.cam.ac.uk/hightea/api/processes/ yields

    {
      "processes/pp_jx_7TeV":"pp -> j + X at 7 TeV",
      "processes/pp_aax_8TeV":"pp -> a a + X at 8 TeV",
      "processes/pp_ttx_13TeV":"pp -> t tbar +X at 13 TeV with mt = 172.5 GeV"
    }
    

    This API call takes no parameters.

  • api/proceses/<PROCESS NAME> Returns a key-value mapping with the metadata for the process with identifier <PROCESS NAME>. Details about the metadata provided can be found below.

    This API call takes no parameters.

  • api/processes/<PROCESS NAME>/hist Returns an histogram for the given process based on the user input.

    The API call must be made using the POST request method and contain data as a JSON key value pair which conforms to the following schema:

    • "observables" (required, list of observable specifications): A list of dictionaries, each containing:

      • "name" (optional): A label for the observable.

      • "binning" : A list of dictionaries. A one dimensional histogram contains 1 element, a two dimensional 2 elements and so on. Each containing:

        • "variable": Variable in which to bin.

        • "bins": An ordered list of numbers defining the edges of the bins. The literal Infinity is allowed (as it is - Infinity).

      The resulting histogram will correspond to the outer product of all the bin specifications in the list.

    • "custom_variables" (optional, key value mapping): A map of names to expressions in terms of pre-existing variables or particle momenta. The variables defined here become available for usage in bin specifications, scale definitions and cuts. The expression syntax is described in more detail below.

    • "cuts" (optional, list of cuts): A list of inequalities between expressions. The histogram will only consider events for which the inequalities are fulfilled. The expression syntax is described in more detail below.

    • "pdf" (optional, string): The PDF set to be used to reweigh the events. If given, $alpha_S$ and the scales will be evaluated to match the new set.

    • "muR" (optional, string): An expression corresponding to the renormalization scale used for reweighing.

    • "muF" (optional, string): An expression corresponding to the factorization scale used for reweighing.

    • "contributions" (list of strings, optional): A subset of the contributions or contributions groups in a process, typically specifying the perturbative order. If specified, only the weights associated to the listed contributions will be taken into account. By default all contributions are evaluated, which corresponds to highest perturbative order.

    An example of a valid request payload is:

    {
        "contributions":["NNLO"],
        "custom_variables": {"circle": "sqrt(pt_top**2 + pt_tbar**2)"},
        "cuts": ["y_tbar <  4", "pt_tbar > 10"],
        "pdf": "CT14nnlo",
        "pdf_member":0,
        "muR": "2*HTo4",
        "muF": "2*m_tt",
        "observables": [
           {
             "name":"my2Dobservable",
             "binning": [
               {
                 "variable": "circle",
                 "bins": [0, 20, 40, 60, 80, "Infinity"]
               },
               {
                 "variable": "m_tt",
                 "bins": [350, 450, 550, 650, "Infinity"]
               }
               ]
           }
           ]
    }
    
  • api/available_pdfs : Returns a list with the available PDFs that can be used for reweighting.

Description of metadata

The returned mapping includes (but it is not restricted to) the following keys:

  • name (string): The short description of the process.

  • details (string): A more detailed description of the process.

  • layout (list): A specification of the particles available for analysis. Currently the valid specifications for an item in the lists are: A key value pair "particle_momenta": <NAME OF THE PARTICLE>. This indicates that the momenta of the listed particles are available for analysis. They are available as variables of the form p_<NAME OF THE PARTICLE>_<u> where <u> is the 4-momentum index, one of {0, 1, 2, 3}. The index 0 corresponds to the energy, the indexes 1 and 2 correspond to the transverse momentum and the index 3 corresponds the longitudinal momentum. All momenta are provided in the laboratory frame. For example

    "layout": [
        {"particle_momenta": "t"},
        {"particle_momenta": "tbar"},
    ]
    

    in the top pair production process means that the variable p_t_0 is the energy of the top quark and p_tbar_3 is the longitudinal momentum of the anti-top quark.

    In case of final state jets the layout provides the parton momenta which are clustered with a jet algorithm (either the default as specified below or according to the request). The jet are ordered with respect to their transverse momentum and can be accessed by p_j1_0 etc.

  • variables (map of string to string): A mapping of variable names to expressions, corresponding to the default predefined variables available in the analysis. For example

    "variables": {
        "pt_t": "sqrt(p_t_1**2 + p_t_2**2)",
        "pt_tbar": "sqrt(p_tbar_1**2 + p_tbar_2**2)",
        "y_t": "0.5*log((p_t_0 + p_t_3)/(p_t_0 - p_t_3))",
        "y_tbar": "0.5*log((p_tbar_0 + p_tbar_3)/(p_tbar_0 - p_tbar_3))"
      }
    

    defines the transverse momentum and rapidity for top ad anti-top.

  • default_jet_parameters (dictionary): A mapping containing the default values for nmaxjet, p and R.

  • scales_info (string): A short description of the default scales

  • muR0 (string): The default renormalisation scale expressed in terms of a predefined variable.

  • muF0 (string): The default factorisation scale expressed in terms of a predefined variable.

  • pdf_set (string): The default PDF set.

  • pdf_member (string): The default PDF member.

  • contribution_groups (dictionary): A mapping of contributions to the sub-contributions.

  • available_pdfs (dictionary): Detailed information about PDF sets available for this process, potentially containing specialised process specific PDFs. The additional information is used by the hightea.client package to automatise PDF uncertainty estimations.

Writing expressions

Several API parameters (cuts, muR, muF, custom_variables) involve expressions. These correspond to mathematical functions of the variables. Expressions are written using the conventions of the Python language (e.g. the operator ** is used for exponents and function calls use parenthesis). Expressions admit:

  • References to particle momenta for the particles defined in the layout (e.g. p_tbar_0).

  • References to the predefined variables for the process.

  • References to custom_variables.

  • Mathematical operations such as +, / or **.

  • Parenthesis.

  • Arithmetic and trigonometric functions such as sqrt or log or min.

The HighTEA CLI

An alternative way to interact with the HighTEA API is the highteacli command-line-interface. This executable should available after installation of the hightea-client package.

The basic workflow consists on providing requests in the JSON format as input and then analysing the resulting output.

The most frequent command is:

highteacli hist <PROCESS NAME> <PATH TO THE JSON FILE FROM THE CURRENT DIRECTORY>

For 1D histograms, you can add the --plot argument to obtain a quick visualization of the result.

Available processes (to fill in <PROCESS NAME>) can be queried with

highteacli lproc

The format of the input is described in detail in the API section, and examples are provided.

For example a computing the y distribution of the top quark in a top-quark pair production process can be achieved with the following file input (test.json):

{
  "observables": [
    {
      "binning":{
        "variable": "y_t",
        "bins": [-2,-1,0,1,2],
      }
    }
  ]
}

Now we can query the 13 TeV top-quark pair dataset as follows

$ highteacli hist pp_tt_13000_172.5 test.json
Processing request. The token is cb7a4c94edea11ea8bc49d8a216f62d5.
Wait for the result here or run

    highteacli token cb7a4c94edea11ea8bc49d8a216f62d5

-Token completed
Result written to cb7a4c94edea11ea8bc49d8a216f62d5.json

Each successful invocation of the command generates an unique id, token that is associated to the requested computation. With the default options, the token name is used to generate the filename.

You can recover data on an existing token, possibly with a simple visualization for 1D histograms, which will be written in the current directory.

$ highteacli token --plot cb7a4c94edea11ea8bc49d8a216f62d5
|Token completed
Result written to cb7a4c94edea11ea8bc49d8a216f62d5.json
/
Histogram plot writen to cb7a4c94edea11ea8bc49d8a216f62d5.png

The full set of options can be seen with the --help flag.

$ highteacli --help

And specific help for each command can be obtained with

$ highteacli --help <COMMAND>`

hightea-client class documentation

Interface class

class hightea.client.interface.Interface(name: str, directory='.', overwrite=False, *, auth=None, endpoint='https://www.hep.phy.cam.ac.uk/hightea/api/')

High-level user interface to the HighTEA database

Examples

>>> job = hightea('jobname')
>>> job.process('pp_tt_13000_172.5')
>>> job.contribution('LO')
>>> job.observable('pt_t',[0.,50.,100.,150.,200.,250.])
>>> job.request()
>>> job.show_result()
list_processes(detailed=True)

Request the list of available processes from the server.

Parameters:

detailed (bool, default True) – If True detailed information for each process is provided, if False only the process key is shown.

list_pdfs()

Request the list of available pdfs from the server.

process(proc: str, verbose=True)

Define process for this instance. A request to the server is performed and the process’ metadata is stored.

Parameters:
  • proc (str) – String containing the process key.

  • verbose (bool, default True) – If True the process information is printed.

define_new_variable(name: str, definition: str)

Define new variable The definition has to be a python expression using pre-defined variables, see process meta data for additional information.

Parameters:
  • name (str) – A name for the new variable

  • definition (str) – The definition can be given in terms of mathematical functions of the already defined variables. Expressions are written using the conventions of the Python language.

add_variable_definitions(definitions: dict)

Add variable definitions from dictionary.

The specified to be a dictionary of "name":"definition" pairs.

Parameters:

definitions (dict) – The dictionary containing the definitions.

store_variable_definitions(filename: str)

Store variable definitions to file.

Parameters:

filename (str) – The filename containing the definitions.

load_variable_definitions(filename: str)

Load variable definitions from file.

The specified file is expected to be json dictionary of "name":"definition" pairs.

Parameters:

filename (str) – The filename containing the definitions.

request()

Create and submit requests for all defined histograms and wait for the answer. The results is stored.

submit_request()

Submit request to database but don’t wait for the answer. The answer can be received via request_result().

request_result()

Request the results for submitted request. If all tokens have been completed the function returns true otherwise false.

Returns:

finished – If all requested tokens have been completed the return value is True, False otherwise.

Return type:

bool

get_requests()

Return the stored requests This includes not only the request but also the results in raw format

Notes

This routine is meant for debugging and troubleshooting.

result()
Return the results of a request taking into account

systematic uncertainties from requested variations.

The returned dictionary can be used within the hightea-plotting routines.

Returns:

result – A dictionary containing the results.

Return type:

dict

raw_result()

Return the raw results of a job.

The returned dictionary can be used within the hightea-plotting routines.

Returns:

results – A list of dictionaries containing the results.

Return type:

list(dict)

show_result()

Print the result in a human readable form

contribution(con)

Define contribution(s) for histogram.

Parameters:

con (str or list(str)) – A string (or list of strings) defining the contribution(s)

observable(variable, binning, name=None)

Add a observable defined by a variable and bin specification

Parameters:
  • variable (str or list(str)) – The variable to be binned.

  • binning (list(float) or list(list(float))) – A list of bin edges.

  • name (str) – A label for the observable (optional)

scales(muR: str, muF: str)

Define the central scale choices Define the central choice for the renormalization (muR) and factorization (muF) scale

Parameters:
  • muR (str) – Expression to define muR.

  • muF (str) – Expression to define muF.

pdf(pdf: str, pdf_member=0)

Define the pdf

Parameters:
scale_variation(variation_type: str)

Specify the type of scale variations

Implemented variations
  • '3-point': 3-point variation around central scales

  • '7-point': 7-point variation around central scales

More individual type of variations can be specified via Interface.set_custom_variation().

Parameters:

variation_type (str) – String corresponding to a defined variation.

pdf_variation(method='smpdf')

Include PDF member variation

There are two different methods of PDF variation. Standard ‘full’ variation and the more efficient ‘smpdf’ variation. If not specified explicitly the client tries to use the more efficient ‘smpdf’ variation, depending on the availability of a corresponding reduced PDF set. If ‘smpdf’ is not available, the ‘full’ variation is performed.

Parameters:

method (str, default 'smpdf') – The method of PDF variation.

set_custom_variation(variations: list, method: str)

Define custom variations.

Parameters:
  • variations (list(str)) – Each string has to be of format “muR,muF,pdf,pdf_member”.

  • method (str) – The method to compute the error from the variation

cuts(cuts)

Specify phase space cuts.

This allows to constrain the phase space for the requested process. For processes which required generation cuts, the user cuts have to be more exclusive then the generation cuts. If they are not the result will correspond to the union of generation and user cuts only (which may render the user cuts irrelevant). With other words the generation cuts are always applied on top of the user cuts.

Parameters:

cuts (list(str)) – Each string has to be inequality equation of defined variables.

jet_parameters(jet_parameters)

Specify jet parameters.

This allows to specify parameters for the jet algorithm. This is possible for processes where a corresponding default parameters set is defined in the metadata.

The following parameter are available:
  • 'maxnjets': the number of jets returned by the algorithm

  • 'p' : the power of the kt-algorithm (-1: anti-kT,1: kt)

  • 'R' : the radius parameter

NOTE: Please be advised that, similar to cuts, results for processes that require a jet-algorithm on the generation level are only correct for more exclusive definitions of the jet-algorithm. This is a bit more subtle in case of the jet-algorithm case and therefore these parameters should be used carefully.

Parameters:

jet_parameters (dict) – A dict containing the members ‘maxnjets’(int), ‘p’(int), ‘R’(float).

store()

Store job information on drive

DataHandler class

class hightea.client.datahandler.DataHandler(data)

This class provides facilities to easily obtain physical quantities from raw scale/PDF variations.

The basic assumption is that the data used for initialization is central result. The added data (with DataHandler.add_data()) is assumed to be the variations. The uncertainties are computed from all the data according to the specified method (‘envelope’,’replicas’,’hessian’,’symhessian’) by invoking DataHandler.compute_uncertainties(). Each call to compute_uncertainties adds a “sys_error” to all bins and fiducial cross section. If compute_uncertainties is not invoked DataHandler.get_result() returns the input.

get_result()

Return the result.

Returns:

result – A dictionary which corresponds to the histogram data used in the constructor but exented by systemic errors if DataHandler.compute_uncertainty() has been used.

Return type:

dict

is_compatible(data)

Checks if the data set is compatible with base line data.

Returns:

test – True if the added data set is compatible

Return type:

bool

add_data(data: dict)

Adds the result of a request to handler. Prints an error if the data set is not compatible.

compute_sys_error(values: list, method: str, rescale_factor=1.0)

Compute the uncertainty from provided values and return dict containing ‘error_sys_pos’ and ‘error_sys_neg’.

Parameters:
  • values (list (float)) – A list of floats representing the variation of the value

  • method (str) –

    A string specifying the method to compute the uncertainty from the provided list of numbers. Implemented are:
    • 'envelope': Return the maximal positive and negative distance to the central value.

    • 'replicas': Computing the uncertainty from STD of the numbers.

    • 'hessian': Assumes that the values correspond to list of pairs and computes the uncertainty according to 0901.0002 sec 6.

    • 'symmhessian': Same as ‘hessian’ assuming however symmetric uncertainties.

  • rescale_factor (float (default 1)) – Rescale the computed uncertainty with a factor.

Returns:

result – A dict {‘method’:method,’pos’:sys_error_pos,’neg’:sys_error_neg}

Return type:

dict

compute_uncertainty(method: str, rescale_factor=1.0)

Compute the uncertainty from the stored data. The result is stored internally and can be accesssed with DataHandler.get_result().

SMPDFinput(directory: str, pdf: str, parameters={})

Write data in format suitable for the SMPDF method

It is assumed that the added data represent a full PDF variation. The header files contain some standard parameters which might be adapted by the user through the parameter argument. A directory as specified is created.

Parameters:
  • directory (str) – Path specifing the output directory.

  • pdf (str) – Specify the orignal pdf

  • parameters (dict) – A dict with parameters for the SMPDF input: - smpdf_nonlinear_corrections (bool, False) - smpdf_tolerance (float, 0.15) - order (int,2) - energy_scale (float,100)

API class

exception hightea.client.apiactions.RequestProblem

Base error that will be raised in case of problematic interactions with the API. Use the __cause__ attribute of the error to inspect the underlying problem.

class hightea.client.apiactions.API(*, auth=None, endpoint='https://www.hep.phy.cam.ac.uk/hightea/api/')

Helper class to interact with the HighTEA API.

property auth

Return authentication token.

Returns:

token – A string containing the authentication token.

Return type:

str

set_auth(auth)

Set authentication token to be used in requests in current session. If authentication has already been set, remove it.

Parameters:

auth (str) – A string containing an authentication token.

simple_req_no_json(method, url, data=None, form_data=None)

Call the endpoint with the specified parameters and return the response object. Raise a RequestProblem error in case of failure. The method and url parameters are passed to requests.Request.request(). The data object is encoded as JSON.

Parameters:
  • method (str) – Specifying the request method, i.e. “GET”, “POST” etc.

  • url (str) – Request destination.

  • data (dict) – The data to be transmitted in form of a dictionary.

  • form_data (dict) – Additional information to added to the request. Used for authentication.

Returns:

response – A object containing the response information. See requests implementation.

Return type:

Response object

simple_req(method, url, data=None, form_data=None)

Call the endpoint with the specified parameters and return the JSON response. See API.simple_req_no_json().

Returns:

response – Returns the response to the request in JSON/dict format.

Return type:

dict

auth_code(username, password, admin=False)

Implementation of the authentication request.

Parameters:
  • username (str) – A string containing the username

  • password (str) – A string containing the password

  • admin (bool) – Request admin login (requires admin privileges) (optional).

Returns:

token – The authentication token to be used in requests.

Return type:

str

login(username, password, admin=False)

Perform login, i.e. submit username and password and store authentication token.

Parameters:
  • username (str) – A string containing the username

  • password (str) – A string containing the password

  • admin (bool) – Request admin login (requires admin privileges) (optional).

anonymous_login()

A method to anonymously login. This functionality might be removed in the future.

make_invitation_url(admin: bool = False)

Generate a URL that can be used to register a new user.

Parameters:

admin (bool) – Whether the new user will be able to claim admin privileges.

Returns:

url – A URL to send to the user

Return type:

str

wait_token_impl(token)

Block for the specified token until it is completed. Use this method to implement interactive behaviours while the computation is in progress. Otherwise use higher level methods such as API.wait_token_json() or API.wait_token_plot().

Parameters:

token (str) – A token representing a previous result, to wait for.

Yields:

token_status (dict) – A dictionary containing information relative to the token.

get_plot(token)

Return an histogram plot for a computed token.

Parameters:

token (str) – A token representing a previous result, to wait for.

Returns:

plot – A byte string representing a figure in the PNG format.

Return type:

bytes

Notes

If the computation corresponding to the token is not finalized, this method will fail. Use API.wait_token_plot() to block until the result is ready.

wait_token_json(token)

Block for the specified token and return a JSON result.

Parameters:

token (str) – A token representing a previous result, to wait for.

Returns:

result – A dictionary representing the result of the computation.

Return type:

dict

wait_token_plot(token)

Block until the specified token is available. When it is, return an histogram representation.

Parameters:

token (str) – A token representing a previous result, to wait for.

Returns:

plot – A byte string representing a figure in the PNG format.

Return type:

bytes

list_pdfs()

List the available PDF for central value computations.

Returns:

pdfs – A list of LHAPDF ids.

Return type:

list

list_processes()

List the processes available in the server.

Returns:

processes – A list of processes

Return type:

list

get_metadata(proc)

Retrieve metadata for specified process

Returns:

metadata – A dictionary containing meta data.

Return type:

dict

request_hist(proc, data)

Submit histogram request to server.

Parameters:
Returns:

token – A token representing the request. Results can be obtained with API.wait_token_json() or API.wait_token_plot().

Return type:

str

check_token(token)

Check information

Parameters:

token (str) – A token representing a previous result, to wait for.

Returns:

info – A dictionary with the information on a specific token.

Return type:

dict

Indices and tables