Functions

1. Function: extract_filename

extract_filename help to extract information from the file name. syntax is extract_filename(filename,split = " ",end = ".csv", remove = " ", sep="-"). filename is the file name. split is the portions at which the name has to be split (default is space " “). end is the extension of file name that need to be removed (default is”.csv“). remove is the portion from the file name that need to be omitted after splitting (default is space” “). sep add a symbol between separate sections, default is”-".

This function is useful for extracting specific information from file names, like compound name, plate number etc, to provide appropriate analysis.

For e.g.

extract_filename("L HEPG2 P3 72HRS.csv")
#> [1] "L HEPG2 P3 72HRS.csv" "L-HEPG2-P3-72HRS"     "L"                   
#> [4] "HEPG2"                "P3"                   "72HRS"

extract_filename("L HEPG2 P3 72HRS.csv", split=" ",end=".csv",remove="L",sep="")
#> [1] "L HEPG2 P3 72HRS.csv" "HEPG2P372HRS"         "HEPG2"               
#> [4] "P3"                   "72HRS"

2. Function: rmodd_summary

rmodd_summary help to remove the outliers and summarise the values from a given set of function. Syntax is rmodd_summary(x, rm = "FALSE", strict= "FALSE", cutoff=80,n=3). x is a numeric vector. rm = TRUE if want to remove outliers. If strict = FALSE those values above/below 1.5 IQR is omitted (outliers omitted). If strict = TRUE more aggresive outlier removal is used to bring %cv below cutoff. n is the minimum number of samples you need per group if more aggresive outlier removal is used.

For e.g.

x<- c(1.01,0.98,0.6,0.54,0.6,0.6,0.4,3)
rmodd_summary(x, rm = "FALSE", strict= "FALSE", cutoff=80,n=3)
#>       mean     median          n         sd         cv 
#>  0.9662500  0.6000000  8.0000000  0.8487796 87.8426480

rmodd_summary(x, rm = "TRUE", strict= "FALSE", cutoff=80,n=3)
#>       mean     median          n         sd         cv 
#>  0.6757143  0.6000000  7.0000000  0.2294818 33.9613684

rmodd_summary(x, rm = "TRUE", strict= "TRUE", cutoff=20,n=5)
#>       mean     median          n         sd         cv 
#>  0.7216667  0.6000000  6.0000000  0.2132057 29.5435138

3. Function: data2plateformat

data2plateformat convert the data (eg: readings from a 96 well plate) to appropriate matrix format. Syntax is data2plateformat(data, platetype = 96). data is the data to be formatted. platetype is the plate from which the data is coming. It can take 6, 12, 24, 96, 384 values to represent the corresponding multiwell.

For e.g. To rename columns and rows of ‘rawdata96’ to right format.

rawdata<-data2plateformat(rawdata96,platetype = 96)
head(rawdata)
#>       1     2     3     4     5     6     7     8     9    10    11    12
#> A 0.659 0.649 0.598 0.601 0.541 0.553 0.568 0.519 0.576 0.575 0.583 0.504
#> B 0.442 0.455 0.586 0.563 0.525 0.548 0.511 0.503 0.533 0.559 0.529 0.535
#> C 0.278 0.266 0.491 0.562 0.510 0.473 0.467 0.433 0.382 0.457 0.475 0.510
#> D 0.197 0.199 0.452 0.456 0.421 0.431 0.409 0.401 0.458 0.412 0.408 0.403
#> E 0.177 0.174 0.447 0.437 0.392 0.412 0.368 0.396 0.397 0.358 0.360 0.393
#> F 0.141 0.137 0.277 0.337 0.294 0.279 0.257 0.263 0.262 0.292 0.280 0.300

4. Function: plate2df

plate2df format matrix type 2D data of multi well plates as a dataframe. The function uses column names and row names of ‘datamatrix’ (2D data of a mutli well plate) and generate a dataframe with row, col (column) and position indices. The ‘value’ column represent corresponding value in the ‘datamarix’..

Syntax is plate2df(datamatrix). datamatrix is the data in matrix format.

For eg.

OD_df <- plate2df(rawdata)
head(OD_df)
#>   row col position value
#> 1   A   1      A01 0.659
#> 2   A   2      A02 0.649
#> 3   A   3      A03 0.598
#> 4   A   4      A04 0.601
#> 5   A   5      A05 0.541
#> 6   A   6      A06 0.553

5. Function: matrix96

matrix96 help to convert a dataframe in to a matrix format. Syntax is matrix96(dataframe,column,rm="FALSE"). dataframe is the dataframe to be formatted. The dataframe should have a “row” and “col” columns to function smoothly. column is the name of column that need be converted as a matrix.. If rm= “TRUE” then -ve and NA are assigned as 0.

For e.g.

matrix96(OD_df,"value")
#>       1     2     3     4     5     6     7     8     9    10    11    12
#> A 0.659 0.649 0.598 0.601 0.541 0.553 0.568 0.519 0.576 0.575 0.583 0.504
#> B 0.442 0.455 0.586 0.563 0.525 0.548 0.511 0.503 0.533 0.559 0.529 0.535
#> C 0.278 0.266 0.491 0.562 0.510 0.473 0.467 0.433 0.382 0.457 0.475 0.510
#> D 0.197 0.199 0.452 0.456 0.421 0.431 0.409 0.401 0.458 0.412 0.408 0.403
#> E 0.177 0.174 0.447 0.437 0.392 0.412 0.368 0.396 0.397 0.358 0.360 0.393
#> F 0.141 0.137 0.277 0.337 0.294 0.279 0.257 0.263 0.262 0.292 0.280 0.300
#> G 0.122 0.118 0.300 0.288 0.293 0.251 0.245 0.270 0.261 0.259 0.271 0.271
#> H 0.107 0.102 0.320 0.340 0.319 0.270 0.262 0.277 0.294 0.278 0.307 0.316

matrix96(OD_df,"position")
#>   1     2     3     4     5     6     7     8     9     10    11    12   
#> A "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12"
#> B "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "B11" "B12"
#> C "C01" "C02" "C03" "C04" "C05" "C06" "C07" "C08" "C09" "C10" "C11" "C12"
#> D "D01" "D02" "D03" "D04" "D05" "D06" "D07" "D08" "D09" "D10" "D11" "D12"
#> E "E01" "E02" "E03" "E04" "E05" "E06" "E07" "E08" "E09" "E10" "E11" "E12"
#> F "F01" "F02" "F03" "F04" "F05" "F06" "F07" "F08" "F09" "F10" "F11" "F12"
#> G "G01" "G02" "G03" "G04" "G05" "G06" "G07" "G08" "G09" "G10" "G11" "G12"
#> H "H01" "H02" "H03" "H04" "H05" "H06" "H07" "H08" "H09" "H10" "H11" "H12"

6. Function: plate_metadata

plate_metadata combine the plate specific information (like compound used, standard concentration, dilution of samples, etc) and metadata, to produce unique plate metadata. Syntax is plate_metadata(plate_details, metadata,mergeby="type"). plate details is the plate specific information that need to be added to metadata. metadata is the metadata for whole experiment. mergeby is the column that is common to both metadata and plate_meta (this column will be used for merging the information).

For eg. An incomplete meta data

head(metafile96)
#>   row col position type     id concentration dilution
#> 1   A   1      A01 STD1    STD            25       NA
#> 2   A   2      A02 STD1    STD            25       NA
#> 3   A   3      A03   S1 Sample            NA       NA
#> 4   A   4      A04   S1 Sample            NA       NA
#> 5   A   5      A05   S1 Sample            NA       NA
#> 6   A   6      A06   S1 Sample            NA       NA

Plate specific details are.

plate_details <- list("compound" = "Taxol",
                "concentration" = c(0.00,0.01,0.02,0.05,0.10,1.00,5.00,10.00),
                "type" = c("S1","S2","S3","S4","S5","S6","S7","S8"),
                "dilution" = 1)

Using plate specific info, the metadata can be filled by calling plate_metadata function.

plate_meta<-plate_metadata(plate_details,metafile96,mergeby="type")
head(plate_meta)
#>   row col type position     id dilution concentration compound
#> 1   A   1 STD1      A01    STD       NA            25     <NA>
#> 2   A   2 STD1      A02    STD       NA            25     <NA>
#> 3   A   3   S1      A03 Sample        1             0    Taxol
#> 4   A   4   S1      A04 Sample        1             0    Taxol
#> 5   A   5   S1      A05 Sample        1             0    Taxol
#> 6   A   6   S1      A06 Sample        1             0    Taxol

To join both plate_meta and OD_df, innerjoin (is a dplyr function) can be used.

data_DF<- dplyr::inner_join(OD_df,plate_meta,by=c("row","col","position"))

head(data_DF)
#>   row col position value type     id dilution concentration compound
#> 1   A   1      A01 0.659 STD1    STD       NA            25     <NA>
#> 2   A   2      A02 0.649 STD1    STD       NA            25     <NA>
#> 3   A   3      A03 0.598   S1 Sample        1             0    Taxol
#> 4   A   4      A04 0.601   S1 Sample        1             0    Taxol
#> 5   A   5      A05 0.541   S1 Sample        1             0    Taxol
#> 6   A   6      A06 0.553   S1 Sample        1             0    Taxol

7. Function: heatplate

heatplate help to create a heatmap of multiwell plate. The syntax is heatplate(datamatrix,name,size=7.5). datamatrix is the data in matrix format. An easy way to create this is by calling ‘matrix96’ as explained before. name is the name to be given for heatmap, size is the size of each well in the heatmap (default is 7.5).

This function will give a heatmap of normalized values if the ‘variable’ is numeric. If it is a factorial variable, it will simple provide a coloured categorical plot.

eg 1. Categorical plot

 datamatrix<-matrix96(metafile96,"id")
datamatrix
#>   1       2       3        4        5        6        7        8       
#> A "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> B "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> C "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> D "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> E "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> F "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> G "STD"   "STD"   "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#> H "Blank" "Blank" "Sample" "Sample" "Sample" "Sample" "Sample" "Sample"
#>   9        10       11       12      
#> A "Sample" "Sample" "Sample" "Sample"
#> B "Sample" "Sample" "Sample" "Sample"
#> C "Sample" "Sample" "Sample" "Sample"
#> D "Sample" "Sample" "Sample" "Sample"
#> E "Sample" "Sample" "Sample" "Sample"
#> F "Sample" "Sample" "Sample" "Sample"
#> G "Sample" "Sample" "Sample" "Sample"
#> H "Sample" "Sample" "Sample" "Sample"

heatplate(datamatrix,"Plate 1", size=5)

eg 2. Heatmap

rawdata<-data2plateformat(rawdata96,platetype = 96)
OD_df<- plate2df(rawdata)
data<-matrix96(OD_df,"value")
data
#>       1     2     3     4     5     6     7     8     9    10    11    12
#> A 0.659 0.649 0.598 0.601 0.541 0.553 0.568 0.519 0.576 0.575 0.583 0.504
#> B 0.442 0.455 0.586 0.563 0.525 0.548 0.511 0.503 0.533 0.559 0.529 0.535
#> C 0.278 0.266 0.491 0.562 0.510 0.473 0.467 0.433 0.382 0.457 0.475 0.510
#> D 0.197 0.199 0.452 0.456 0.421 0.431 0.409 0.401 0.458 0.412 0.408 0.403
#> E 0.177 0.174 0.447 0.437 0.392 0.412 0.368 0.396 0.397 0.358 0.360 0.393
#> F 0.141 0.137 0.277 0.337 0.294 0.279 0.257 0.263 0.262 0.292 0.280 0.300
#> G 0.122 0.118 0.300 0.288 0.293 0.251 0.245 0.270 0.261 0.259 0.271 0.271
#> H 0.107 0.102 0.320 0.340 0.319 0.270 0.262 0.277 0.294 0.278 0.307 0.316

heatplate(data,"Plate 1", size=5)

8. Function: reduceblank

reduceblank help to reduce blank values from the readings.

The syntax is reduceblank (dataframe,x_vector,blank_vector,y). dataframe is the data. x_vector is the entries for which the blank has to be reduced. If all entries has to reduced use “All”. x_vector should be in a vector format eg: c(“drug1”,“drug2”,drug3" etc). blank_vector is the vector of blank names whose value has to be reduced (should be in a vector format eg: c(“blank1”,“blank2”,“blank3”,“blank4”)). This function will reduce the first blank vector element from first x_vector element and so on. y is the column name where the action will take place. y should be numeric in nature. The results will appear as a new column named ‘blankminus’.

For eg.

data_DF<-reduceblank(data_DF, x_vector =c("All"),blank_vector = c("Blank"), "value")
head(data_DF)
#>   row col position value type     id dilution concentration compound blankminus
#> 1   A   1      A01 0.659 STD1    STD       NA            25     <NA>     0.5545
#> 2   A   2      A02 0.649 STD1    STD       NA            25     <NA>     0.5445
#> 3   A   3      A03 0.598   S1 Sample        1             0    Taxol     0.4935
#> 4   A   4      A04 0.601   S1 Sample        1             0    Taxol     0.4965
#> 5   A   5      A05 0.541   S1 Sample        1             0    Taxol     0.4365
#> 6   A   6      A06 0.553   S1 Sample        1             0    Taxol     0.4485

9. Function: estimate

estimate help to estimate the unknown variable (eg: concentration) based on the standard curve. Syntax is estimate(data=dataframe,colname="blankminus",fitformula=fit, methord="linear/nplr"). data is the dataframe which need to be evaluated. colname is the column name for which the values has to be estimated. fitformula is the filling formula used. methord is to specify if linear or nonparametric logistic curve was used for the fitformula.

For eg: data_DF is a dataframe for which the concentration has to be estimated based on the value of blankminus.

For filtering the ‘standards’

std<- dplyr::filter(data_DF, data_DF$id=="STD")  
std<- aggregate(std$blankminus ~ std$concentration, FUN = mean )
colnames (std) <-c("con", "OD")
head(std)
#>     con     OD
#> 1  0.39 0.0155
#> 2  0.78 0.0345
#> 3  1.56 0.0710
#> 4  3.13 0.0935
#> 5  6.25 0.1675
#> 6 12.50 0.3440

To fit a standard curve.

fit1 is the 3 parameter logistic curve model and fit2 is the linear regression model. The appropriate one for your experiment can be used.

fit2<-stats::lm(formula = con ~ OD,data = std)# linear model
fit1<-nplr::nplr(std$con,std$OD,npars=3,useLog = FALSE)#  nplr, 3 parameter model

For estimating the concentration using linear model

estimated<-estimate(data_DF,colname="blankminus",fitformula=fit2,method="linear")
head(estimated)
#>   row col position value type     id dilution concentration compound blankminus
#> 1   A   1      A01 0.659 STD1    STD       NA            25     <NA>     0.5545
#> 2   A   2      A02 0.649 STD1    STD       NA            25     <NA>     0.5445
#> 3   A   3      A03 0.598   S1 Sample        1             0    Taxol     0.4935
#> 4   A   4      A04 0.601   S1 Sample        1             0    Taxol     0.4965
#> 5   A   5      A05 0.541   S1 Sample        1             0    Taxol     0.4365
#> 6   A   6      A06 0.553   S1 Sample        1             0    Taxol     0.4485
#>   estimated
#> 1  23.96838
#> 2  23.51493
#> 3  21.20234
#> 4  21.33838
#> 5  18.61769
#> 6  19.16183

For estimating the concentration using nplr methord

estimated2<-estimate(data_DF,colname="blankminus",fitformula=fit1,method="nplr")
head(estimated2)
#>   row col position value type     id dilution concentration compound blankminus
#> 1   A   1      A01 0.659 STD1    STD       NA            25     <NA>     0.5545
#> 2   A   2      A02 0.649 STD1    STD       NA            25     <NA>     0.5445
#> 3   A   3      A03 0.598   S1 Sample        1             0    Taxol     0.4935
#> 4   A   4      A04 0.601   S1 Sample        1             0    Taxol     0.4965
#> 5   A   5      A05 0.541   S1 Sample        1             0    Taxol     0.4365
#> 6   A   6      A06 0.553   S1 Sample        1             0    Taxol     0.4485
#>   estimated
#> 1  26.39687
#> 2  24.01751
#> 3  18.68524
#> 4  18.88785
#> 5  15.70869
#> 6  16.23867

10. Function: dfsummary

dfsummary() help to summarize the dataframe (based on a column). It has additional controls to group samples and to omit variables not needed. syntax is dfsummary(dataframe,y,grp_vector,rm_vector,nickname,rm="FALSE",param). dataframe is the data. y is the numeric variable (column name) that has to be summarized. grp_vector is a vector of column names, based on which samples are grouped. The order of elements in grp_vector determines the order of grouping. rm_vector is the vector of items need to be omitted before summarizing. nickname is the name that has to be given to the output dataframe. rm=“FALSE” if outliers has not to be removed. If outliers has to be removed then rm =“TRUE”. For more stringent methord for removing outlier the parameters are provided in a vector param. param has to be entered in the format c(strict=“TRUE”,cutoff=40,n=12). For details please refer rmodd_summary function.

For eg. data has to be summarized based on the “type” column. “estimated” values are summarized. samples are grouped as per “id”. “STD” and “Blank” values need to be omitted. outliers are not omitted (rm=“FALSE”). nickname for the plate is “plate1”.

result<-dfsummary(estimated,"estimated",c("id","type"),
        c("STD","Blank"),"plate1", rm="FALSE",
        param=c(strict="FALSE",cutoff=40,n=12))
#> F1
#> F2

result
#>       id type  label  N   Mean    SD    CV
#> 1 Sample   S1 plate1 10 19.561 1.465  7.49
#> 2 Sample   S2 plate1 10 18.536 1.141  6.15
#> 3 Sample   S3 plate1 10 15.670 2.194 14.00
#> 4 Sample   S4 plate1 10 13.362 1.026  7.68
#> 5 Sample   S5 plate1 10 12.043 1.359 11.29
#> 6 Sample   S6 plate1 10  6.969 1.066 15.30
#> 7 Sample   S7 plate1 10  6.370 0.819 12.85
#> 8 Sample   S8 plate1 10  7.612 1.174 15.42

11. Function: pvalue

pvalue() help to calculate the significance by t-test on the result dataframe. Syntax is pvalue(dataframe,control,sigval). dataframe is the result of dfsummary. control is the group that is considered as control, sigval is the pvalue cutoff (a value below this is considered as significant). For eg.

pval<-pvalue(result, control="S8", sigval=0.05)
head(pval)
#>       id type  label  N   Mean    SD    CV   pvalue significance
#> 1 Sample   S8 plate1 10  7.612 1.174 15.42 control              
#> 2 Sample   S1 plate1 10 19.561 1.465  7.49  < 0.001          Yes
#> 3 Sample   S2 plate1 10 18.536 1.141  6.15  < 0.001          Yes
#> 4 Sample   S3 plate1 10 15.670 2.194 14.00  < 0.001          Yes
#> 5 Sample   S4 plate1 10 13.362 1.026  7.68  < 0.001          Yes
#> 6 Sample   S5 plate1 10 12.043 1.359 11.29  < 0.001          Yes

Vignette of package bioassays

Anwar Azad Palakkan, Jamie Davies

Introduction

prerequisite

Functions

1. Function: extract_filename

2. Function: rmodd_summary

3. Function: data2plateformat

4. Function: plate2df

5. Function: matrix96

6. Function: plate_metadata

7. Function: heatplate

8. Function: reduceblank

9. Function: estimate

10. Function: dfsummary

11. Function: pvalue