Azure Resource Group Security and Authentication

Tags

,

The security configuration of Azure services is a fairly broad subject area, and this is more of an introduction to how it works for a Resource Group, in the context of a Logic App that needs to interact with multiple resources. If a DevOps team, instead of the developers, is managing the security, a considerable amount of time will be spent planning and designing services beforehand, and later trying to figure out which security configurations are preventing the services working. This is even more of a problem with Logic Apps on a standard subscription, as they can’t be saved unless everything is validated and runs.

There are two categories of authentication methods for Logic Apps:

  • a) The service needs to authenticate with external services, APIs, whatever’s being integrated, etc., which means it will need API keys and tokens. Sensitive data will need methods of storage and access.
  • b) Some components in the Resource Group need to authenticate with other components, for example the Key Vault.

App Registrations

The first thing a Resource Group will need is an App Registration, which is an object in Active Directory, and there is an ‘App registrations‘ section for this. Its profile in Azure Portal will list endpoints for various authentication methods, a ‘key vault’ for secrets and certificates, and owners. None of the permissions granted to the App Registration are managed in Active Directory, though. It is where we might enable access to Microsoft Graph, under ‘API permissions‘.

Managed Identities

A Logic App will also have a Managed Identity, which is a system-assigned identity that a Logic App might use for authenticating with other things in a Resource Group. It’s essentially an Object (principal) ID, which is assigned permissions by Active Directory.

The configuration/profile for a Managed Identity can be viewed by selecting the ‘Identity‘ option in the Settings section for the Logic App. The Role-Based Access Control (RBAC) access policies for the Managed Identity can be viewed and modified under ‘Azure role assignments‘. In the case of the Logic App I’m looking at, it has the following:

  • Storage Blob Data Contributor
  • Key Vault Secrets User

Selecting one of these, Azure Portal will display the actions that can be performed with that role.

Key Vault

A component in a Resource Group might need to use an API key to make requests to an external service. This API key might be stored in a Key Vault (recommended), which the Logic App would need access to. I’ve covered Key Vaults before, and how access to them can be restricted to specific Managed Identities and IP address ranges.

Putting it all together, and accessing an external API

A Logic App I’m currently working on has a Managed Identity, which has ‘Key Vault Reader’ permissions. From the Key Vault, it reads a secret, which is used as the password for getting an OAuth2 token from Azure Active Directory. The request for this looks something like:

The OAuth token that’s returned authenticates the Logic App with an external API. The TenantID here is the App Registration for that service.

So, the sequence is:

  • Get secret from the Key Vault
  • Get OAuth2 token, using username/password authentication. The password for this is the secret from the Key Vault.
  • Declare the OAuth2 token as an AuthenticationKey variable.
  • The AuthenticationKey variable is used as a header value in external API requests.

An API request can be of the following format, in the Logic App’s code view:

Authenticating API Requests in an Azure Logic App

Tags

,

One way to help understand how Active Directory, App Registrations, managed identities and Key Vaults work in Microsoft Azure is to look at a typical Logic App that authenticates itself with an API that provides data only to services registered with the organisation’s Active Directory.

The basic sequence for the authentication part of the Logic App here is this:

  1. Get a password from an Azure Key Vault.
  2. Get an OAuth token for the Logic App from Active Directory.
  3. Extract a value from the OAuth token and use it as the Authorizaion header value for HTTP requests to the API.

Note: For some reason, Microsoft had renamed Active Directory ‘Entra ID’, but I’ll still refer to it as Active Directory to avoid confusion.

Get Secret

The purpose of this step is to read a password that’s stored in an Azure Key Vault. Obviously our Logic App or its Resource Groups needs a ‘System-assigned managed identity‘ to access this.

Use this configuration when creating a new connection:

  • Default Azure AD authentication for OAuth, because access to the Key Vault is managed by Azure AD.
  • TenantID of the organisation’s Active Directory instance.
  • Name of the Key Vault. There might be more than one in the Resource Group.

In order to access the Key Vault, the Logic App must be added to its Access policies.

Get Token

This step is an HTTP request to Azure AD to get a OAuth token, but we use the POST method. I believe the Logic App’s Resource Group will need an App Registration (essentially an Active Directory account) in order to do this.

The Logic App will make an HTTP request to the Active Directory, with the instance’s Tenant ID included in the URI:

https://login.microsoftonline.com/@{parameters('TenantId')}/oauth2/v2.0/token

The request payload will be something like:

grant_type=client_credentials&scope=api://@{parameters(MyApplicationScope')}/.default

For this, I’ve used basic username and password authentication.

AuthKey Variable

This stage is merely to extract a value from the token response and declare it as a variable for the Logic App. This value is to be used in the Authorization header field of subsequent HTTP requests to the external API.

The variable name is set as ‘AuthKey‘, and the value is the following expression:

concat('Bearer ',body('Get_Token')?['access_token'])

Using the AuthKey Variable

This goes into the Authorization HTTP header of the REST request. The JSON for this step will look something like:

"inputs": {

      "method": "GET",

            "uri": "@{parameters('API-URL')}/products",

        "headers": { "Authorization": "@variables('AuthKey')" }

      }

How secure is Azure Key Vault?

Tags

Azure Key Vault is designed to store (and protect) secrets such as API keys, passwords, cryptographic keys, connection strings, etc. It can store the following categories of data:

  • Keys
  • Certificates
  • Secrets

The Secrets storage can store arbitrary plaintext values, so Key Vault could potentially be used as a means of centrally managing a collection of usernames and passwords that could be copied and pasted straight from the Azure Portal. Is this a safe method of password management, though? How secure is a Key Vault, really? Is it marginally more secure than storing API keys, client secrets and connection strings in a Web application’s configuration file?

The Microsoft documentation states that:

‘The Key Vault front end (data plane) is a multi-tenant server. This means that key vaults from different customers can share the same public IP address. In order to achieve isolation, each HTTP request is authenticated and authorized independently of other requests’.

It would seem that all Key Vaults are on the same machine or VM, and there’s no physical or logical segregation.

Elsewhere, the documentation also states:

‘Azure Key Vault uses nCipher HSMs, which are Federal Information Processing Standards (FIPS) 140-2 Level 2 validated. You can use nCipher tools to move a key from your HSM to Azure Key Vault.’

That’s another way of saying that the Key Vault is ultimately a Hardware Security Module, manufactured by Entrust, rather than a typical server hardware, to store the secrets, and it has some method of restricting what can access its storage.

As for user access, there are two methods of restricting this:

  • Azure role-based access control, which involves using Managed Identities and Azure Active Directory to determine whether an application or service is authorised to access the Key Vault.
  • IP address: This is effectively a default-deny firewall that restricts access to users, applications and services from specific IP address ranges, even using the Azure Portal. This is useful if we want Key Vault to be accessible only to users on the corporate network, whether physically or via a VPN gateway. Access can also be allowed for named virtual networks.

So, it means that an attacker would require more than access to an authorised user’s AD account, which should be protected to some extent by 2FA. The attacker would require access to a machine or proxy server on the authorised network.

Of course, Key Vault is designed for authorised applications and services to read secrets from, and that means there’s an API for it. We can configure the security of an API connection to a Key Vault, by requiring that connection to be authenticated. Options include Azure AD authentication and Managed Identity. This would restrict API access to authorised applications and clients.


Further Reading

Microsoft: About Azure Key Vault

Microsoft: Azure Key Vault Security

If you must use Power BI Report Builder….

My brief encounter with Power BI began after one of the project managers had the idea of using Power BI Report Builder as a means for generating archived versions of what’s displayed on a site I developed not long ago, because he wanted a PDF export feature for that.

‘Why not just copy a couple of the API methods and add a PDF converter to them?‘, I wondered, with the feeling I’d end up needing to do that anyway.

It turned out another team and several engineers had already attempted this, and gave up. I had a lot more success and produced something approaching a working solution, but came to the same conclusion: Report Builder is the wrong tool for the job.

Power BI is an analysis tool for Excel people and data analysts who typically do Excel things and know more about DAX than a SQL admin/developer would care to know – hence the terminology I used here might be incorrect.

The main problem, I soon discovered, is Report Builder not only fetched everything from the data source, but created a row for every permutation of the records plus those they’re mapped to in other database tables – we’re talking literally millions of rows, if everything we need is added to the main report, and without adding a filtering parameter to the DAX query itself. A cursory search on the Web told me this is a common problem, and I do offer a partial solution here.

Loading Data into PowerBI Report Builder

The data layer of a PowerBI report is very analogous to that in a .NET application: There is a connection and a data model. In the left-hand window, these are labelled ‘Data Sources‘ and ‘Datasets‘.

The ‘Data Sources‘ folder will contain objects representing connections to databases and APIs – these should already exist in whichever PowerBI workspace is being used. The ‘Datasets‘ folder will contain the model of the data tables that are loaded from the .pbix file.

The dataset will be empty until a query is added. Right-click the dataset’s name, then ‘Query…‘. Each entry under that node represents a table within the data model – perhaps a database table. Field names from within each must be dragged into the main window to get the data. That’s how a very basic report is created.

Setting up a filter or query

Even in a basic report, the data would need to be queryable, so the report can show only the data for one specific record.

There are two methods of parameterising a Report Builder report to view a selected one. This method selects rows from the data that’s already fetched from the dataset. The other method I describe later on – parameterising a DAX query – filters the data before it’s loaded into Report Builder, and that’s a partial solution to the data loading problem.

Setting up a query for the data that’s already loaded is a two-step process. First we need to add a parameter, which is basically an empty variable declared as a string, integer, etc.

Click ‘Parameters‘ -> ‘Add Parameter…‘.

The only section to worry about in the Report Parameter Properties window is the initial one. The Name and Prompt are the parameter name and the UI label for that parameter. Here I’ve set the parameter type as a string and allowed the value to be null.

The next step is to add a filter to the dataset and map it to the parameter that was just added. Right-click on the dataset, and select ‘Dataset Properties‘. In the Dataset Properties window, filters can be added in the Filters window by mapping the parameter to any field in the dataset.

What can we do about the loading time?

The only solution I can find, after some digging, is to limit or trim the number of records Report Builder fetches before they’re loaded into the report. That involves playing with DAX. The thing is DAX is a data manipulation language, not a database query language, but I’m hoping we can do basic SQL-like operations with it.

Clearly the default query is pulling a vast amount of data. It does the equivalent of a SQL SELECT. It will look something like this:

EVALUATE SUMMARIZECOLUMNS('SitePages'[page_id], 'SitePages'[page_title], 'SitePages'[date_published], 'SitePages'[page_content], 'SitePages'[page_tags], 'SitePages'[related_content])

This will fetch all records and instances of them for each mapped record.

I’ve tried using the DAX equivalent of ‘SELECT * WHERE page_id = ”’. This uses the FILTER instruction. e.g.

EVALUATE FILTER (SUMMARIZECOLUMNS ('SitePages'[page_id], 'SitePages'[page_title], 'SitePages'[date_published], 'SitePages'[page_content], 'SitePages'[page_tags], 'SitePages'[related_content]),    'SitePages'[page_title] = "My Site Page")

It still takes a good ten minutes to load, as even then it creates a huge number of rows representing all the relational data being included.

Filtering data by in-query parameter

The DAX query couldn’t access the parameter I already added for the report, so it had to be declared again in the query editor.

When I added a parameter to the Query Designer window and run the report, the parameter appeared in the parameter box at the top. This is where the parameter that was added can be declared and mapped to a field in the report.

Add a parameter to the query so it looks something like this:

EVALUATE DISTINCT (FILTER (SUMMARIZECOLUMNS ('SitePages'[page_id], 'SitePages'[page_title], 'SitePages'[date_published], 'SitePages'[page_content], 'SitePages'[page_tags], 'SitePages'[related_content]        ),'SitePages'[page_title] = @PageTitle ))

Recommended: Use Sub-Reports

The report will run much faster if relational data is loaded as subreports. The process for this is a little drawn out, partly because there’s a lot of parameter passing, and because Report Builder only allows subreports to be imported from a Power BI workspace.

Nothing I read online worked for me, so I’m hoping others would find this useful:

  • In the sub-report file, set a parameter, under the ‘Parameters‘ section. There’s nothing special about this, as it’s just a placeholder. Set the Name, Prompt and possibly the null and multiple vales options in the General tab. Don’t change anything in the other tabs unless you need to.
  • In the Datasets section, right-click ‘Dataset Properties‘, and go straight to the Filters tab. The Expression field is the primary key we want to search records on. The Value field should point to the parameter that was created.
  • Now, when the subreport is run, it should display the records in which the primary key matches whatever was entered as the parameter.
  • In the main report, we want to pass whichever of the main dataset’s values to the subreport as its parameter. Add a subreport, and point that to the subreport file. Now we want to pass one of the main report’s dataset values to it. In the ‘Change subreport parameters‘ window, the parameter in the subreport should appear in the dropdown for the ‘Name‘ field. After clicking the expression button, we can select the field to pass to it from the main report’s dataset.

Dependency scanning tools as solutions to a different problem

Tags

,

Since dependency vulnerability scanning has been suggested by colleagues as something that could help with security/standards compliance and addressing legacy software, I’ve been looking into a few services to see how useful the available options could be.

NPM Audit

Dependency scanning is performed by default with the install command for later versions of NPM, and by default, the output of npm audit will show a few lines of relevant information about each package with a reported vulnerability – its rating, a brief description of the vulnerability and whether an update is available.

Initially this seems like a very useful thing. Software dependencies will have defects and security vulnerabilities, and it makes sense to have an automated tool that alerts us of them. However, this one flagged hundreds of packages, in a medium-sized application, as having vulnerabilities, and a handful of those were marked as ‘critical’. They warranted a deeper look. Some of the reports didn’t have any information associated with them telling us why a given severity rating was assigned, or any actual research, and this is my main criticism with the current dependency scanning services I’ve looked at. Why, exactly, is a given vulnerability marked as ‘critical’? Is it even exploitable in the context of how it’s being used? How can we know someone hasn’t merely copied and pasted the output of a scanning tool and arbitrarily assigned a metric or two? Are some of the results false positives?

A few of us had been thinking that, if our organisation did make serious use of this,  several people would need to be tasked with doing the research and trying to make informed decisions about whether to upgrade certain dependencies as a priority. We have run into situations where functions in one package version were deprecated and replaced by something else in a later version, making it necessary to rewrite some of our code.

Scanning in Visual Studio and the Package Manager Console

The NuGet Package Manager provides very useful information about whether the installed dependencies for a project can and should be updated, under the Updates tab in the NuGet window. To use this, the following command can be run in the Package Manager Console:

dotnet list package --vulnerable

This will use api.nuget.org/ to compare the installed package references with known vulnerabilities in the GitHub Advisory Database. The output isn’t too different to what npm audit gives us.

Project `MyProject.Api` has the following vulnerable packages

   [net6.0]:

   Top-level Package            Requested   Resolved   Severity   Advisory URL

   > System.Data/.SqlClient      4.8.3       4.8.3      Moderate  

https://github.com/advisories/GHSA-8g2p-5pqh-5jmc

Obviously, most .NET applications wouldn’t be using anywhere near the same number of dependencies as Node.js/React ones, and the vulnerability reports I’ve seen so far are more actionable. Plus, when I take on a new project, I do make a point of updating package and framework versions.

Dependabot

A feature of GitHub, this performs a very similar function to the local scanning methods, but I think this is more useful for a DevOps team that wants to use dependency scanning as a pre-deployment check.

In a project, go to the Settings tab. Under the ‘Code security and analysis‘ section, there will be options for Dependabot.

Here the following can be enabled:

  • Dependabot alerts
  • Dependabot security updates
  • Dependabot version updates

If we just want Dependabot to scan for vulnerabilities and notify us of them, without making changes to the code, just enable the first option. The alerts (if any) can be viewed under the project’s Security tab, and in the ‘Vulnerability alerts‘ section.

Under the Security tab, it’s also possible to set up code scanning, and GitHub provides the CodeQL Analysis tool. Setting this up involves creating a .yml file in the project, and adding whichever configuration section is appropriate for the project type.

MEND SCA

Formerly known as Whitesource, MEND is the solution we decided on, as we develop with a variety of frameworks, languages and platforms. The MEND dashboard – which is accessible to everyone in the development and DevOps teams – provides a single point of monitoring all this.

(There appear to be a lot of ‘critical’ vulnerabilities, but the products being scanned were all updated within the last few months)

Each entry under Library column links to information about the module/library, and a description of the vulnerability reported for it. Here we also find links to the CVEs. We might even get Base Score Metrics that could help us assess the risk of exploitation. The data we get from MEND is highly specific and granular, but I still came away with the belief that it’s only useful for getting a general idea of the state of the software we’re supporting.

With this in mind, the Security Trends charts are better used for getting an idea of how quickly technical debt and legacy packages are accumulating across everything we support.

One of the more interesting things I noticed about MEND that it also scans the base images of Docker containers, and all the packages within them. That’s something that might otherwise be overlooked.

Is dependency scanning that useful?

All the solutions I’ve looked at are variations on the same thing: They read package manifests (e.g. *package.json* or a Visual Studio project file), perform a lookup for each dependency in a vulnerability database and present the results in whichever way.

The main criticism I have with this is the scanning isn’t directly useful for enhancing the security of applications, largely because the metrics won’t accurately reflect how exploitable they are. After all, there’s only one reference to this in the current OWASP Top 10.

Higher-level security testing of the application itself, with the right framework, would give us a more accurate idea of how exploitable it is, and would cover other vulnerabilities related to how the software is put together and configured.

What is it useful for, then? I believe dependency scanning is more appropriate as a tool for addressing technical debt. It can be used for encouraging developers to keep the packages they use reasonably updated, and to help prevent a situation in which a huge amount of legacy components have accumulated.

The idea was being tentatively suggested that we could use these systems – MEND in particular – for continual monitoring and patching. We are already upgrading, patching, documenting and supporting a large number of applications and services, with each upgrade involving a necessarily complicated process to mitigate the risks of disrupting anything critical that happens to be dependent on it. To continually resolve everything flagged by the scanners and release updated versions of the software wouldn’t be realistic, unless there was an additional team dedicated to just that. Even then, that team would be hammered with alerts, so we need a way of filtering out the ones that aren’t useful.

In my opinion, dependency scanners should be treated as tools for software engineers with the expertise to make informed decisions about how the results should be acted on, and as a source of information about the general state of the software and technical debt.