Welcome to HighTEA client’s documentation
User software for interacting with the HighTEA database from the Centre for Precision Studies in Particle Physics. See the central website of HighTEA (High-energy Theory Event Analysis):
http://www.precision.hep.phy.cam.ac.uk/hightea/
and the physics publication:
View this README including the inline documentation of the python library on ReadTheDocs.
Installation
This package is available on PyPi via:
pip install hightea-client
The python library interface can be imported via import hightea.client. For a quick start, we recommend new users to have a look at the examples and tutorial that can be found here:
For details on the functionality please refer to the ReadTheDocs documentation.
The HighTEA API
Through the interface the users makes web requests to the HighTEA database service. A dynamic documentation of the API can be found here.
Here a description of the most user-relevant APIs:
api/processes: Returns a key-value map where the keys are the identifiers for the available processes and the value is the short description of the process. The identifying keys are to be used in subsequent API calls to retrieve metadata or perform analysis on the process. For example browsing tohttps://www.hep.phy.cam.ac.uk/hightea/api/processes/yields{ "processes/pp_jx_7TeV":"pp -> j + X at 7 TeV", "processes/pp_aax_8TeV":"pp -> a a + X at 8 TeV", "processes/pp_ttx_13TeV":"pp -> t tbar +X at 13 TeV with mt = 172.5 GeV" }
This API call takes no parameters.
api/proceses/<PROCESS NAME>Returns a key-value mapping with the metadata for the process with identifier<PROCESS NAME>. Details about the metadata provided can be found below.This API call takes no parameters.
api/processes/<PROCESS NAME>/histReturns an histogram for the given process based on the user input.The API call must be made using the POST request method and contain data as a JSON key value pair which conforms to the following schema:
"observables"(required, list of observable specifications): A list of dictionaries, each containing:"name"(optional): A label for the observable."binning": A list of dictionaries. A one dimensional histogram contains 1 element, a two dimensional 2 elements and so on. Each containing:"variable": Variable in which to bin."bins": An ordered list of numbers defining the edges of the bins. The literalInfinityis allowed (as it is- Infinity).
The resulting histogram will correspond to the outer product of all the bin specifications in the list.
"custom_variables"(optional, key value mapping): A map of names to expressions in terms of pre-existing variables or particle momenta. The variables defined here become available for usage in bin specifications, scale definitions and cuts. The expression syntax is described in more detail below."cuts"(optional, list of cuts): A list of inequalities between expressions. The histogram will only consider events for which the inequalities are fulfilled. The expression syntax is described in more detail below."pdf"(optional, string): The PDF set to be used to reweigh the events. If given, $alpha_S$ and the scales will be evaluated to match the new set."muR"(optional, string): An expression corresponding to the renormalization scale used for reweighing."muF"(optional, string): An expression corresponding to the factorization scale used for reweighing."contributions"(list of strings, optional): A subset of the contributions or contributions groups in a process, typically specifying the perturbative order. If specified, only the weights associated to the listed contributions will be taken into account. By default all contributions are evaluated, which corresponds to highest perturbative order.
An example of a valid request payload is:
{ "contributions":["NNLO"], "custom_variables": {"circle": "sqrt(pt_top**2 + pt_tbar**2)"}, "cuts": ["y_tbar < 4", "pt_tbar > 10"], "pdf": "CT14nnlo", "pdf_member":0, "muR": "2*HTo4", "muF": "2*m_tt", "observables": [ { "name":"my2Dobservable", "binning": [ { "variable": "circle", "bins": [0, 20, 40, 60, 80, "Infinity"] }, { "variable": "m_tt", "bins": [350, 450, 550, 650, "Infinity"] } ] } ] }
api/available_pdfs: Returns a list with the available PDFs that can be used for reweighting.
Description of metadata
The returned mapping includes (but it is not restricted to) the following keys:
name(string): The short description of the process.details(string): A more detailed description of the process.layout(list): A specification of the particles available for analysis. Currently the valid specifications for an item in the lists are: A key value pair"particle_momenta": <NAME OF THE PARTICLE>. This indicates that the momenta of the listed particles are available for analysis. They are available as variables of the formp_<NAME OF THE PARTICLE>_<u>where<u>is the 4-momentum index, one of {0, 1, 2, 3}. The index 0 corresponds to the energy, the indexes 1 and 2 correspond to the transverse momentum and the index 3 corresponds the longitudinal momentum. All momenta are provided in the laboratory frame. For example"layout": [ {"particle_momenta": "t"}, {"particle_momenta": "tbar"}, ]
in the top pair production process means that the variable
p_t_0is the energy of the top quark andp_tbar_3is the longitudinal momentum of the anti-top quark.In case of final state jets the layout provides the parton momenta which are clustered with a jet algorithm (either the default as specified below or according to the request). The jet are ordered with respect to their transverse momentum and can be accessed by
p_j1_0etc.variables(map of string to string): A mapping of variable names to expressions, corresponding to the default predefined variables available in the analysis. For example"variables": { "pt_t": "sqrt(p_t_1**2 + p_t_2**2)", "pt_tbar": "sqrt(p_tbar_1**2 + p_tbar_2**2)", "y_t": "0.5*log((p_t_0 + p_t_3)/(p_t_0 - p_t_3))", "y_tbar": "0.5*log((p_tbar_0 + p_tbar_3)/(p_tbar_0 - p_tbar_3))" }
defines the transverse momentum and rapidity for top ad anti-top.
default_jet_parameters(dictionary): A mapping containing the default values fornmaxjet,pandR.scales_info(string): A short description of the default scalesmuR0(string): The default renormalisation scale expressed in terms of a predefined variable.muF0(string): The default factorisation scale expressed in terms of a predefined variable.pdf_set(string): The default PDF set.pdf_member(string): The default PDF member.contribution_groups(dictionary): A mapping of contributions to the sub-contributions.available_pdfs(dictionary): Detailed information about PDF sets available for this process, potentially containing specialised process specific PDFs. The additional information is used by thehightea.clientpackage to automatise PDF uncertainty estimations.
Writing expressions
Several API parameters (cuts, muR, muF, custom_variables) involve expressions. These correspond to mathematical functions of the variables. Expressions are written using the conventions of the Python language (e.g. the operator ** is used for exponents and function calls use parenthesis). Expressions admit:
References to particle momenta for the particles defined in the layout (e.g.
p_tbar_0).References to the predefined variables for the process.
References to
custom_variables.Mathematical operations such as
+,/or**.Parenthesis.
Arithmetic and trigonometric functions such as
sqrtorlogormin.
The HighTEA CLI
An alternative way to interact with the HighTEA API is the highteacli command-line-interface. This executable should available after installation of the hightea-client package.
The basic workflow consists on providing requests in the JSON format as input and then analysing the resulting output.
The most frequent command is:
highteacli hist <PROCESS NAME> <PATH TO THE JSON FILE FROM THE CURRENT DIRECTORY>
For 1D histograms, you can add the --plot argument to obtain a quick visualization of the result.
Available processes (to fill in <PROCESS NAME>) can be queried with
highteacli lproc
The format of the input is described in detail in the API section, and examples are provided.
For example a computing the y distribution of the top quark in a top-quark pair production process can be achieved with the following file input (test.json):
{
"observables": [
{
"binning":{
"variable": "y_t",
"bins": [-2,-1,0,1,2],
}
}
]
}
Now we can query the 13 TeV top-quark pair dataset as follows
$ highteacli hist pp_tt_13000_172.5 test.json
Processing request. The token is cb7a4c94edea11ea8bc49d8a216f62d5.
Wait for the result here or run
highteacli token cb7a4c94edea11ea8bc49d8a216f62d5
-Token completed
Result written to cb7a4c94edea11ea8bc49d8a216f62d5.json
Each successful invocation of the command generates an unique id, token that is associated to the requested computation. With the default options, the token name is used to generate the filename.
You can recover data on an existing token, possibly with a simple visualization for 1D histograms, which will be written in the current directory.
$ highteacli token --plot cb7a4c94edea11ea8bc49d8a216f62d5
|Token completed
Result written to cb7a4c94edea11ea8bc49d8a216f62d5.json
/
Histogram plot writen to cb7a4c94edea11ea8bc49d8a216f62d5.png
The full set of options can be seen with the --help flag.
$ highteacli --help
And specific help for each command can be obtained with
$ highteacli --help <COMMAND>`
hightea-client class documentation
Interface class
- class hightea.client.interface.Interface(name: str, directory='.', overwrite=False, *, auth=None, endpoint='https://www.hep.phy.cam.ac.uk/hightea/api/')
High-level user interface to the HighTEA database
Examples
>>> job = hightea('jobname') >>> job.process('pp_tt_13000_172.5') >>> job.contribution('LO') >>> job.observable('pt_t',[0.,50.,100.,150.,200.,250.]) >>> job.request() >>> job.show_result()
- list_processes(detailed=True)
Request the list of available processes from the server.
- Parameters:
detailed (bool, default True) – If True detailed information for each process is provided, if False only the process key is shown.
- list_pdfs()
Request the list of available pdfs from the server.
- process(proc: str, verbose=True)
Define process for this instance. A request to the server is performed and the process’ metadata is stored.
- define_new_variable(name: str, definition: str)
Define new variable The definition has to be a python expression using pre-defined variables, see process meta data for additional information.
- add_variable_definitions(definitions: dict)
Add variable definitions from dictionary.
The specified to be a dictionary of
"name":"definition"pairs.- Parameters:
definitions (dict) – The dictionary containing the definitions.
- store_variable_definitions(filename: str)
Store variable definitions to file.
- Parameters:
filename (str) – The filename containing the definitions.
- load_variable_definitions(filename: str)
Load variable definitions from file.
The specified file is expected to be json dictionary of
"name":"definition"pairs.- Parameters:
filename (str) – The filename containing the definitions.
- request()
Create and submit requests for all defined histograms and wait for the answer. The results is stored.
- submit_request()
Submit request to database but don’t wait for the answer. The answer can be received via request_result().
- request_result()
Request the results for submitted request. If all tokens have been completed the function returns true otherwise false.
- Returns:
finished – If all requested tokens have been completed the return value is True, False otherwise.
- Return type:
- get_requests()
Return the stored requests This includes not only the request but also the results in raw format
Notes
This routine is meant for debugging and troubleshooting.
- result()
- Return the results of a request taking into account
systematic uncertainties from requested variations.
The returned dictionary can be used within the hightea-plotting routines.
- Returns:
result – A dictionary containing the results.
- Return type:
- raw_result()
Return the raw results of a job.
The returned dictionary can be used within the hightea-plotting routines.
- show_result()
Print the result in a human readable form
- contribution(con)
Define contribution(s) for histogram.
- observable(variable, binning, name=None)
Add a observable defined by a variable and bin specification
- scales(muR: str, muF: str)
Define the central scale choices Define the central choice for the renormalization (muR) and factorization (muF) scale
- pdf(pdf: str, pdf_member=0)
Define the pdf
- Parameters:
pdf (str) – PDF name (refer to
Interface.list_pdfs()).pdf_member (int, default 0) – Specify PDF member
- scale_variation(variation_type: str)
Specify the type of scale variations
- Implemented variations
'3-point': 3-point variation around central scales'7-point': 7-point variation around central scales
More individual type of variations can be specified via
Interface.set_custom_variation().- Parameters:
variation_type (str) – String corresponding to a defined variation.
- pdf_variation(method='smpdf')
Include PDF member variation
There are two different methods of PDF variation. Standard ‘full’ variation and the more efficient ‘smpdf’ variation. If not specified explicitly the client tries to use the more efficient ‘smpdf’ variation, depending on the availability of a corresponding reduced PDF set. If ‘smpdf’ is not available, the ‘full’ variation is performed.
- Parameters:
method (str, default 'smpdf') – The method of PDF variation.
- cuts(cuts)
Specify phase space cuts.
This allows to constrain the phase space for the requested process. For processes which required generation cuts, the user cuts have to be more exclusive then the generation cuts. If they are not the result will correspond to the union of generation and user cuts only (which may render the user cuts irrelevant). With other words the generation cuts are always applied on top of the user cuts.
- jet_parameters(jet_parameters)
Specify jet parameters.
This allows to specify parameters for the jet algorithm. This is possible for processes where a corresponding default parameters set is defined in the metadata.
- The following parameter are available:
'maxnjets': the number of jets returned by the algorithm'p': the power of the kt-algorithm (-1: anti-kT,1: kt)'R': the radius parameter
NOTE: Please be advised that, similar to cuts, results for processes that require a jet-algorithm on the generation level are only correct for more exclusive definitions of the jet-algorithm. This is a bit more subtle in case of the jet-algorithm case and therefore these parameters should be used carefully.
- Parameters:
jet_parameters (dict) – A dict containing the members ‘maxnjets’(int), ‘p’(int), ‘R’(float).
- store()
Store job information on drive
DataHandler class
- class hightea.client.datahandler.DataHandler(data)
This class provides facilities to easily obtain physical quantities from raw scale/PDF variations.
The basic assumption is that the data used for initialization is central result. The added data (with
DataHandler.add_data()) is assumed to be the variations. The uncertainties are computed from all the data according to the specified method (‘envelope’,’replicas’,’hessian’,’symhessian’) by invokingDataHandler.compute_uncertainties(). Each call to compute_uncertainties adds a “sys_error” to all bins and fiducial cross section. If compute_uncertainties is not invokedDataHandler.get_result()returns the input.- get_result()
Return the result.
- Returns:
result – A dictionary which corresponds to the histogram data used in the constructor but exented by systemic errors if
DataHandler.compute_uncertainty()has been used.- Return type:
- is_compatible(data)
Checks if the data set is compatible with base line data.
- Returns:
test – True if the added data set is compatible
- Return type:
- add_data(data: dict)
Adds the result of a request to handler. Prints an error if the data set is not compatible.
- compute_sys_error(values: list, method: str, rescale_factor=1.0)
Compute the uncertainty from provided values and return dict containing ‘error_sys_pos’ and ‘error_sys_neg’.
- Parameters:
values (list (float)) – A list of floats representing the variation of the value
method (str) –
- A string specifying the method to compute the uncertainty from the provided list of numbers. Implemented are:
'envelope': Return the maximal positive and negative distance to the central value.'replicas': Computing the uncertainty from STD of the numbers.'hessian': Assumes that the values correspond to list of pairs and computes the uncertainty according to 0901.0002 sec 6.'symmhessian': Same as ‘hessian’ assuming however symmetric uncertainties.
rescale_factor (float (default 1)) – Rescale the computed uncertainty with a factor.
- Returns:
result – A dict {‘method’:method,’pos’:sys_error_pos,’neg’:sys_error_neg}
- Return type:
- compute_uncertainty(method: str, rescale_factor=1.0)
Compute the uncertainty from the stored data. The result is stored internally and can be accesssed with
DataHandler.get_result().
- SMPDFinput(directory: str, pdf: str, parameters={})
Write data in format suitable for the SMPDF method
It is assumed that the added data represent a full PDF variation. The header files contain some standard parameters which might be adapted by the user through the parameter argument. A directory as specified is created.
API class
- exception hightea.client.apiactions.RequestProblem
Base error that will be raised in case of problematic interactions with the API. Use the
__cause__attribute of the error to inspect the underlying problem.
- class hightea.client.apiactions.API(*, auth=None, endpoint='https://www.hep.phy.cam.ac.uk/hightea/api/')
Helper class to interact with the HighTEA API.
- property auth
Return authentication token.
- Returns:
token – A string containing the authentication token.
- Return type:
- set_auth(auth)
Set authentication token to be used in requests in current session. If authentication has already been set, remove it.
- Parameters:
auth (str) – A string containing an authentication token.
- simple_req_no_json(method, url, data=None, form_data=None)
Call the endpoint with the specified parameters and return the response object. Raise a
RequestProblemerror in case of failure. Themethodandurlparameters are passed torequests.Request.request(). Thedataobject is encoded as JSON.- Parameters:
- Returns:
response – A object containing the response information. See requests implementation.
- Return type:
Response object
- simple_req(method, url, data=None, form_data=None)
Call the endpoint with the specified parameters and return the JSON response. See
API.simple_req_no_json().- Returns:
response – Returns the response to the request in JSON/dict format.
- Return type:
- auth_code(username, password, admin=False)
Implementation of the authentication request.
- login(username, password, admin=False)
Perform login, i.e. submit username and password and store authentication token.
- anonymous_login()
A method to anonymously login. This functionality might be removed in the future.
- wait_token_impl(token)
Block for the specified token until it is completed. Use this method to implement interactive behaviours while the computation is in progress. Otherwise use higher level methods such as
API.wait_token_json()orAPI.wait_token_plot().- Parameters:
token (str) – A token representing a previous result, to wait for.
- Yields:
token_status (dict) – A dictionary containing information relative to the token.
- get_plot(token)
Return an histogram plot for a computed token.
- Parameters:
token (str) – A token representing a previous result, to wait for.
- Returns:
plot – A byte string representing a figure in the PNG format.
- Return type:
Notes
If the computation corresponding to the token is not finalized, this method will fail. Use
API.wait_token_plot()to block until the result is ready.
- wait_token_json(token)
Block for the specified token and return a JSON result.
- wait_token_plot(token)
Block until the specified token is available. When it is, return an histogram representation.
- list_pdfs()
List the available PDF for central value computations.
- Returns:
pdfs – A list of LHAPDF ids.
- Return type:
- list_processes()
List the processes available in the server.
- Returns:
processes – A list of processes
- Return type:
- get_metadata(proc)
Retrieve metadata for specified process
- Returns:
metadata – A dictionary containing meta data.
- Return type:
- request_hist(proc, data)
Submit histogram request to server.
- Parameters:
proc (str) – A tag specifying a process. See
API.list_processes().data (dict) – A dictionary defining the details of the request. For information about the expected structure and possible options please refer to the README (https://github.com/HighteaCollaboration/hightea-client)
- Returns:
token – A token representing the request. Results can be obtained with
API.wait_token_json()orAPI.wait_token_plot().- Return type: